AI-Based Sequence Similarity Analysis as Digital Genetic Evidence: A Pilot Study on Growth-Related Genes
Introduction — Stunting remains a major public health challenge, particularly in low- and middle-income countries, where growth impairment is influenced by complex interactions between environmental and biological factors. While nutritional and socioeconomic determinants have been extensively studie...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | en |
| Published: |
LPPM Universitas Harapan Bangsa, Indonesia
2026
|
| Subjects: | |
| Online Access: | http://ir.unimas.my/id/eprint/51569/1/Al-Hakim%2Bet%2Bal_JBDFI_1-1_13-22.pdf http://ir.unimas.my/id/eprint/51569/ https://ejournal.uhb.ac.id/index.php/jbdfi/article/view/2189 https://ejournal.uhb.ac.id/index.php/jbdfi/article/view/2189/1112 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Introduction — Stunting remains a major public health challenge, particularly in low- and middle-income countries, where growth impairment is influenced by complex interactions between environmental and biological factors. While nutritional and socioeconomic determinants have been extensively studied, the potential role of genetic susceptibility related to growth regulation remains underexplored from a bio-digital and forensic informatics perspective. This study investigates whether sequence-level similarity patterns among growth-related genes can be represented as digital genetic evidence using artificial intelligence–based computational analysis.
Methods — This pilot exploratory study analyzed protein and coding DNA sequences of six candidate growth-related genes (IGF1, IGF1R,GH1, GHR, LEP, SLC39A8) obtained from curated RefSeq Homo sapiens databases. An alignment-free analytical framework was implemented using k-mer term frequency–inverse document frequency (TF-IDF) feature extraction combined with principal component analysis for dimensionality reduction. Pairwise similarity assessment and embedding-based visualization were employed to explore latent sequence relationships.
Results — The analysis revealed distinct similarity patterns among growth-related genes, with hormonally associated genes and receptor proteins forming coherent clusters, while nutrient transporter–related genes exhibited clear separation in the embedding space. These patterns were biologically plausible and consistent with known functional characteristics, despite the absence of explicit functional
annotation during feature extraction.
Conclusion — The findings demonstrate that AI-based alignment-free sequence analysis can generate reproducible similarity representations that function as digital genetic evidence. As a pilot exploratory study, this work highlights the feasibility of sequence-level similarity profiling for investigating growth-related genetic susceptibility, while providing a methodological foundation for future large-scale and population-specific studies. |
|---|
