A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications
RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA?s operations and character. The modificatio...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Other Authors: | |
| Format: | Article |
| Published: |
Nature Research
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1833350545276928000 |
|---|---|
| author | Uddin I. Awan H.H. Khalid M. Khan S. Akbar S. Sarker M.R. Abdolrasol M.G.M. Alghamdi T.A.H. |
| author2 | 58993722900 |
| author_facet | 58993722900 Uddin I. Awan H.H. Khalid M. Khan S. Akbar S. Sarker M.R. Abdolrasol M.G.M. Alghamdi T.A.H. |
| author_sort | Uddin I. |
| building | UNITEN Library |
| collection | Institutional Repository |
| content_provider | Universiti Tenaga Nasional |
| content_source | UNITEN Institutional Repository |
| continent | Asia |
| country | Malaysia |
| description | RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA?s operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis. ? The Author(s) 2024. |
| format | Article |
| id | my.uniten.dspace-36210 |
| institution | Universiti Tenaga Nasional |
| publishDate | 2025 |
| publisher | Nature Research |
| record_format | dspace |
| spelling | my.uniten.dspace-362102025-03-03T15:41:35Z A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications Uddin I. Awan H.H. Khalid M. Khan S. Akbar S. Sarker M.R. Abdolrasol M.G.M. Alghamdi T.A.H. 58993722900 57298070200 57192190458 57204809479 57194609918 37122644300 35796848700 57456914500 5-Methylcytosine Algorithms Cytosine Humans Machine Learning 5 methylcytosine 5-hydroxymethylcytosine cytosine algorithm human machine learning metabolism RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA?s operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis. ? The Author(s) 2024. Final 2025-03-03T07:41:35Z 2025-03-03T07:41:35Z 2024 Article 10.1038/s41598-024-71568-z 2-s2.0-85203292932 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203292932&doi=10.1038%2fs41598-024-71568-z&partnerID=40&md5=e986cd50a63803de551f3bd63ca071e8 https://irepository.uniten.edu.my/handle/123456789/36210 14 1 20819 All Open Access; Gold Open Access Nature Research Scopus |
| spellingShingle | 5-Methylcytosine Algorithms Cytosine Humans Machine Learning 5 methylcytosine 5-hydroxymethylcytosine cytosine algorithm human machine learning metabolism Uddin I. Awan H.H. Khalid M. Khan S. Akbar S. Sarker M.R. Abdolrasol M.G.M. Alghamdi T.A.H. A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications |
| title | A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications |
| title_full | A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications |
| title_fullStr | A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications |
| title_full_unstemmed | A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications |
| title_short | A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications |
| title_sort | hybrid residue based sequential encoding mechanism with xgboost improved ensemble model for identifying 5-hydroxymethylcytosine modifications |
| topic | 5-Methylcytosine Algorithms Cytosine Humans Machine Learning 5 methylcytosine 5-hydroxymethylcytosine cytosine algorithm human machine learning metabolism |
| url_provider | http://dspace.uniten.edu.my/ |
