Improved reptile search optimization algorithm using chaotic map and simulated annealing for feature selection in medical field

The increased volume of medical datasets has produced high dimensional features, negatively affecting machine learning (ML) classifiers. In ML, the feature selection process is fundamental for selecting the most relevant features and reducing redundant and irrelevant ones. The optimization algorithm...

Full description

Saved in:
Bibliographic Details
Main Authors: Elgamal, Zenab, Md Sabri, Aznul Qalid, Tubishat, Mohammad, Tbaishat, Dina, Makhadmeh, Sharif Naser, Alomari, Osama Ahmad
Format: Article
Published: Institute of Electrical and Electronics Engineers 2022
Subjects:
Online Access:http://eprints.um.edu.my/42371/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The increased volume of medical datasets has produced high dimensional features, negatively affecting machine learning (ML) classifiers. In ML, the feature selection process is fundamental for selecting the most relevant features and reducing redundant and irrelevant ones. The optimization algorithms demonstrate its capability to solve feature selection problems. Reptile Search Algorithm (RSA) is a new nature-inspired optimization algorithm that stimulates Crocodiles' encircling and hunting behavior. The unique search of the RSA algorithm obtains promising results compared to other optimization algorithms. However, when applied to high-dimensional feature selection problems, RSA suffers from population diversity and local optima limitations. An improved metaheuristic optimizer, namely the Improved Reptile Search Algorithm (IRSA), is proposed to overcome these limitations and adapt the RSA to solve the feature selection problem. Two main improvements adding value to the standard RSA; the first improvement is to apply the chaos theory at the initialization phase of RSA to enhance its exploration capabilities in the search space. The second improvement is to combine the Simulated Annealing (SA) algorithm with the exploitation search to avoid the local optima problem. The IRSA performance was evaluated over 20 medical benchmark datasets from the UCI machine learning repository. Also, IRSA is compared with the standard RSA and state-of-the-art optimization algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Grasshopper Optimization algorithm (GOA) and Slime Mould Optimization (SMO). The evaluation metrics include the number of selected features, classification accuracy, fitness value, Wilcoxon statistical test (p-value), and convergence curve. Based on the results obtained, IRSA confirmed its superiority over the original RSA algorithm and other optimized algorithms on the majority of the medical datasets.