A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer

Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and...

Full description

Saved in:
Bibliographic Details
Main Authors: Eissa, Noureldin, Khairuddin, Uswah, Yusof, Rubiyah, Ahmed Madani, Ahmed Madani
Format: Article
Language:English
Published: Tech Science Press 2023
Subjects:
Online Access:http://eprints.utm.my/106430/1/UswahKhairuddin2023_AMetaheuristicTechniqueforClusterBasedFeature.pdf
http://eprints.utm.my/106430/
http://dx.doi.org/10.32604/cmc.2023.033632
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.106430
record_format eprints
spelling my.utm.1064302024-06-30T06:11:05Z http://eprints.utm.my/106430/ A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer Eissa, Noureldin Khairuddin, Uswah Yusof, Rubiyah Ahmed Madani, Ahmed Madani T Technology (General) Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and often occurs at tumor suppressor gene loci in cancer. Therefore, studying this methylation process may shed light on different gene functions that cannot otherwise be interpreted using the changes that occur in DNA sequences. Currently, microarray technologies, such as Illumina Infinium BeadChip assays, are used to study DNA methylation at an extremely large number of varying loci. At each DNA methylation site, a beta value (ß) is used to reflect the methylation intensity. Therefore, clustering this data from various types of cancers may lead to the discovery of large partitions that can help objectively classify different types of cancers as well as identify the relevant loci without user bias. This study proposed a Nested Big Data Clustering Genetic Algorithm (NBDC-GA), a novel evolutionary metaheuristic technique that can perform cluster-based feature selection based on the DNA methylation sites. The efficacy of the NBDC-GA was tested using real-world data sets retrieved from The Cancer Genome Atlas (TCGA), a cancer genomics program created by the National Cancer Institute (NCI) and the National Human Genome Research Institute. The performance of the NBDC-GA was then compared with that of a recently developed metaheuristic Immuno-Genetic Algorithm (IGA) that was tested using the same data sets. The NBDC-GA outperformed the IGA in terms of convergence performance. Furthermore, the NBDC-GA produced a more robust clustering configuration while simultaneously decreasing the dimensionality of features to a maximum of 67% and of 94.5% for individual cancer type and collective cancer, respectively. The proposed NBDC-GA was also able to identify two chromosomes with highly contrasting DNA methylations activities that were previously linked to cancer. Tech Science Press 2023 Article PeerReviewed application/pdf en http://eprints.utm.my/106430/1/UswahKhairuddin2023_AMetaheuristicTechniqueforClusterBasedFeature.pdf Eissa, Noureldin and Khairuddin, Uswah and Yusof, Rubiyah and Ahmed Madani, Ahmed Madani (2023) A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer. Computers, Materials and Continua, 74 (2). pp. 2817-2838. ISSN 1546-2218 http://dx.doi.org/10.32604/cmc.2023.033632 DOI : 10.32604/cmc.2023.033632
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic T Technology (General)
spellingShingle T Technology (General)
Eissa, Noureldin
Khairuddin, Uswah
Yusof, Rubiyah
Ahmed Madani, Ahmed Madani
A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer
description Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and often occurs at tumor suppressor gene loci in cancer. Therefore, studying this methylation process may shed light on different gene functions that cannot otherwise be interpreted using the changes that occur in DNA sequences. Currently, microarray technologies, such as Illumina Infinium BeadChip assays, are used to study DNA methylation at an extremely large number of varying loci. At each DNA methylation site, a beta value (ß) is used to reflect the methylation intensity. Therefore, clustering this data from various types of cancers may lead to the discovery of large partitions that can help objectively classify different types of cancers as well as identify the relevant loci without user bias. This study proposed a Nested Big Data Clustering Genetic Algorithm (NBDC-GA), a novel evolutionary metaheuristic technique that can perform cluster-based feature selection based on the DNA methylation sites. The efficacy of the NBDC-GA was tested using real-world data sets retrieved from The Cancer Genome Atlas (TCGA), a cancer genomics program created by the National Cancer Institute (NCI) and the National Human Genome Research Institute. The performance of the NBDC-GA was then compared with that of a recently developed metaheuristic Immuno-Genetic Algorithm (IGA) that was tested using the same data sets. The NBDC-GA outperformed the IGA in terms of convergence performance. Furthermore, the NBDC-GA produced a more robust clustering configuration while simultaneously decreasing the dimensionality of features to a maximum of 67% and of 94.5% for individual cancer type and collective cancer, respectively. The proposed NBDC-GA was also able to identify two chromosomes with highly contrasting DNA methylations activities that were previously linked to cancer.
format Article
author Eissa, Noureldin
Khairuddin, Uswah
Yusof, Rubiyah
Ahmed Madani, Ahmed Madani
author_facet Eissa, Noureldin
Khairuddin, Uswah
Yusof, Rubiyah
Ahmed Madani, Ahmed Madani
author_sort Eissa, Noureldin
title A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer
title_short A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer
title_full A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer
title_fullStr A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer
title_full_unstemmed A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer
title_sort metaheuristic technique for cluster-based feature selection of dna methylation data for cancer
publisher Tech Science Press
publishDate 2023
url http://eprints.utm.my/106430/1/UswahKhairuddin2023_AMetaheuristicTechniqueforClusterBasedFeature.pdf
http://eprints.utm.my/106430/
http://dx.doi.org/10.32604/cmc.2023.033632
_version_ 1803335005663920128
score 13.211869