Enhanced dimensionality reduction methods for classifying malaria vector dataset using decision tree

RNA-Seq data are utilized for biological applications and decision making for classification of genes. Lots of work in recent time are focused on reducing the dimension of RNA-Seq data. Dimensionality reduction approaches have been proposed in fetching relevant information in a given data. In this s...

Full description

Saved in:
Bibliographic Details
Main Authors: Arowolo, Micheal Olaolu, Adebiyi, Marion Olubunmi, Adebiyi, Ayodele Ariyo
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2021
Online Access:http://journalarticle.ukm.my/18056/1/7.pdf
http://journalarticle.ukm.my/18056/
https://www.ukm.my/jsm/malay_journals/jilid50bil9_2021/KandunganJilid50Bil9_2021.html
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:RNA-Seq data are utilized for biological applications and decision making for classification of genes. Lots of work in recent time are focused on reducing the dimension of RNA-Seq data. Dimensionality reduction approaches have been proposed in fetching relevant information in a given data. In this study, a novel optimized dimensionality reduction algorithm is proposed, by combining an optimized genetic algorithm with Principal Component Analysis and Independent Component Analysis (GA-O-PCA and GAO-ICA), which are used to identify an optimum subset and latent correlated features, respectively. The classifier uses Decision tree on the reduced mosquito anopheles gambiae dataset to enhance the accuracy and scalability in the gene expression analysis. The proposed algorithm is used to fetch relevant features based from the high-dimensional input feature space. A feature ranking and earlier experience are used. The performances of the model are evaluated and validated using the classification accuracy to compare existing approaches in the literature. The achieved experimental results prove to be promising for feature selection and classification in gene expression data analysis and specify that the approach is a capable accumulation to prevailing data mining techniques.