A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer
Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tech Science Press
2023
|
Subjects: | |
Online Access: | http://eprints.utm.my/106430/1/UswahKhairuddin2023_AMetaheuristicTechniqueforClusterBasedFeature.pdf http://eprints.utm.my/106430/ http://dx.doi.org/10.32604/cmc.2023.033632 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.106430 |
---|---|
record_format |
eprints |
spelling |
my.utm.1064302024-06-30T06:11:05Z http://eprints.utm.my/106430/ A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer Eissa, Noureldin Khairuddin, Uswah Yusof, Rubiyah Ahmed Madani, Ahmed Madani T Technology (General) Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and often occurs at tumor suppressor gene loci in cancer. Therefore, studying this methylation process may shed light on different gene functions that cannot otherwise be interpreted using the changes that occur in DNA sequences. Currently, microarray technologies, such as Illumina Infinium BeadChip assays, are used to study DNA methylation at an extremely large number of varying loci. At each DNA methylation site, a beta value (ß) is used to reflect the methylation intensity. Therefore, clustering this data from various types of cancers may lead to the discovery of large partitions that can help objectively classify different types of cancers as well as identify the relevant loci without user bias. This study proposed a Nested Big Data Clustering Genetic Algorithm (NBDC-GA), a novel evolutionary metaheuristic technique that can perform cluster-based feature selection based on the DNA methylation sites. The efficacy of the NBDC-GA was tested using real-world data sets retrieved from The Cancer Genome Atlas (TCGA), a cancer genomics program created by the National Cancer Institute (NCI) and the National Human Genome Research Institute. The performance of the NBDC-GA was then compared with that of a recently developed metaheuristic Immuno-Genetic Algorithm (IGA) that was tested using the same data sets. The NBDC-GA outperformed the IGA in terms of convergence performance. Furthermore, the NBDC-GA produced a more robust clustering configuration while simultaneously decreasing the dimensionality of features to a maximum of 67% and of 94.5% for individual cancer type and collective cancer, respectively. The proposed NBDC-GA was also able to identify two chromosomes with highly contrasting DNA methylations activities that were previously linked to cancer. Tech Science Press 2023 Article PeerReviewed application/pdf en http://eprints.utm.my/106430/1/UswahKhairuddin2023_AMetaheuristicTechniqueforClusterBasedFeature.pdf Eissa, Noureldin and Khairuddin, Uswah and Yusof, Rubiyah and Ahmed Madani, Ahmed Madani (2023) A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer. Computers, Materials and Continua, 74 (2). pp. 2817-2838. ISSN 1546-2218 http://dx.doi.org/10.32604/cmc.2023.033632 DOI : 10.32604/cmc.2023.033632 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Eissa, Noureldin Khairuddin, Uswah Yusof, Rubiyah Ahmed Madani, Ahmed Madani A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer |
description |
Epigenetics is the study of phenotypic variations that do not alter DNA sequences. Cancer epigenetics has grown rapidly over the past few years as epigenetic alterations exist in all human cancers. One of these alterations is DNA methylation, an epigenetic process that regulates gene expression and often occurs at tumor suppressor gene loci in cancer. Therefore, studying this methylation process may shed light on different gene functions that cannot otherwise be interpreted using the changes that occur in DNA sequences. Currently, microarray technologies, such as Illumina Infinium BeadChip assays, are used to study DNA methylation at an extremely large number of varying loci. At each DNA methylation site, a beta value (ß) is used to reflect the methylation intensity. Therefore, clustering this data from various types of cancers may lead to the discovery of large partitions that can help objectively classify different types of cancers as well as identify the relevant loci without user bias. This study proposed a Nested Big Data Clustering Genetic Algorithm (NBDC-GA), a novel evolutionary metaheuristic technique that can perform cluster-based feature selection based on the DNA methylation sites. The efficacy of the NBDC-GA was tested using real-world data sets retrieved from The Cancer Genome Atlas (TCGA), a cancer genomics program created by the National Cancer Institute (NCI) and the National Human Genome Research Institute. The performance of the NBDC-GA was then compared with that of a recently developed metaheuristic Immuno-Genetic Algorithm (IGA) that was tested using the same data sets. The NBDC-GA outperformed the IGA in terms of convergence performance. Furthermore, the NBDC-GA produced a more robust clustering configuration while simultaneously decreasing the dimensionality of features to a maximum of 67% and of 94.5% for individual cancer type and collective cancer, respectively. The proposed NBDC-GA was also able to identify two chromosomes with highly contrasting DNA methylations activities that were previously linked to cancer. |
format |
Article |
author |
Eissa, Noureldin Khairuddin, Uswah Yusof, Rubiyah Ahmed Madani, Ahmed Madani |
author_facet |
Eissa, Noureldin Khairuddin, Uswah Yusof, Rubiyah Ahmed Madani, Ahmed Madani |
author_sort |
Eissa, Noureldin |
title |
A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer |
title_short |
A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer |
title_full |
A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer |
title_fullStr |
A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer |
title_full_unstemmed |
A metaheuristic technique for cluster-based feature selection of DNA methylation data for cancer |
title_sort |
metaheuristic technique for cluster-based feature selection of dna methylation data for cancer |
publisher |
Tech Science Press |
publishDate |
2023 |
url |
http://eprints.utm.my/106430/1/UswahKhairuddin2023_AMetaheuristicTechniqueforClusterBasedFeature.pdf http://eprints.utm.my/106430/ http://dx.doi.org/10.32604/cmc.2023.033632 |
_version_ |
1803335005663920128 |
score |
13.211869 |