A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer

Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and...

Full description

Saved in:
Bibliographic Details
Main Authors: Eissa, Noureldin, Khairuddin, Uswah, Yusof, Rubiyah, Ahmed Madani, Ahmed Madani
Format: Article
Language:English
Published: Tech Science Press 2023
Subjects:
Online Access:http://eprints.utm.my/106430/1/UswahKhairuddin2023_AMetaheuristicTechniqueforClusterBasedFeature.pdf
http://eprints.utm.my/106430/
http://dx.doi.org/10.32604/cmc.2023.033632
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and often occurs at tumor suppressor gene loci in cancer. Therefore, studying this methylation process may shed light on different gene functions that cannot otherwise be interpreted using the changes that occur in DNA sequences. Currently, microarray technologies, such as Illumina Infinium BeadChip assays, are used to study DNA methylation at an extremely large number of varying loci. At each DNA methylation site, a beta value (ß) is used to reflect the methylation intensity. Therefore, clustering this data from various types of cancers may lead to the discovery of large partitions that can help objectively classify different types of cancers as well as identify the relevant loci without user bias. This study proposed a Nested Big Data Clustering Genetic Algorithm (NBDC-GA), a novel evolutionary metaheuristic technique that can perform cluster-based feature selection based on the DNA methylation sites. The efficacy of the NBDC-GA was tested using real-world data sets retrieved from The Cancer Genome Atlas (TCGA), a cancer genomics program created by the National Cancer Institute (NCI) and the National Human Genome Research Institute. The performance of the NBDC-GA was then compared with that of a recently developed metaheuristic Immuno-Genetic Algorithm (IGA) that was tested using the same data sets. The NBDC-GA outperformed the IGA in terms of convergence performance. Furthermore, the NBDC-GA produced a more robust clustering configuration while simultaneously decreasing the dimensionality of features to a maximum of 67% and of 94.5% for individual cancer type and collective cancer, respectively. The proposed NBDC-GA was also able to identify two chromosomes with highly contrasting DNA methylations activities that were previously linked to cancer.