A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications

RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA?s operations and character. The modificatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Uddin I., Awan H.H., Khalid M., Khan S., Akbar S., Sarker M.R., Abdolrasol M.G.M., Alghamdi T.A.H.
Other Authors: 58993722900
Format: Article
Published: Nature Research 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1833350545276928000
author Uddin I.
Awan H.H.
Khalid M.
Khan S.
Akbar S.
Sarker M.R.
Abdolrasol M.G.M.
Alghamdi T.A.H.
author2 58993722900
author_facet 58993722900
Uddin I.
Awan H.H.
Khalid M.
Khan S.
Akbar S.
Sarker M.R.
Abdolrasol M.G.M.
Alghamdi T.A.H.
author_sort Uddin I.
building UNITEN Library
collection Institutional Repository
content_provider Universiti Tenaga Nasional
content_source UNITEN Institutional Repository
continent Asia
country Malaysia
description RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA?s operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis. ? The Author(s) 2024.
format Article
id my.uniten.dspace-36210
institution Universiti Tenaga Nasional
publishDate 2025
publisher Nature Research
record_format dspace
spelling my.uniten.dspace-362102025-03-03T15:41:35Z A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications Uddin I. Awan H.H. Khalid M. Khan S. Akbar S. Sarker M.R. Abdolrasol M.G.M. Alghamdi T.A.H. 58993722900 57298070200 57192190458 57204809479 57194609918 37122644300 35796848700 57456914500 5-Methylcytosine Algorithms Cytosine Humans Machine Learning 5 methylcytosine 5-hydroxymethylcytosine cytosine algorithm human machine learning metabolism RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA?s operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis. ? The Author(s) 2024. Final 2025-03-03T07:41:35Z 2025-03-03T07:41:35Z 2024 Article 10.1038/s41598-024-71568-z 2-s2.0-85203292932 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203292932&doi=10.1038%2fs41598-024-71568-z&partnerID=40&md5=e986cd50a63803de551f3bd63ca071e8 https://irepository.uniten.edu.my/handle/123456789/36210 14 1 20819 All Open Access; Gold Open Access Nature Research Scopus
spellingShingle 5-Methylcytosine
Algorithms
Cytosine
Humans
Machine Learning
5 methylcytosine
5-hydroxymethylcytosine
cytosine
algorithm
human
machine learning
metabolism
Uddin I.
Awan H.H.
Khalid M.
Khan S.
Akbar S.
Sarker M.R.
Abdolrasol M.G.M.
Alghamdi T.A.H.
A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
title A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
title_full A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
title_fullStr A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
title_full_unstemmed A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
title_short A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
title_sort hybrid residue based sequential encoding mechanism with xgboost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
topic 5-Methylcytosine
Algorithms
Cytosine
Humans
Machine Learning
5 methylcytosine
5-hydroxymethylcytosine
cytosine
algorithm
human
machine learning
metabolism
url_provider http://dspace.uniten.edu.my/