A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data

This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of whi...

Full description

Saved in:
Bibliographic Details
Main Authors: Xu, Mingzhe, Abdullah, Nor Aniza, Md Sabri, Aznul Qalid
Format: Article
Published: Elsevier Ltd 2024
Subjects:
Online Access:http://eprints.um.edu.my/44823/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods. © 2023 Elsevier Ltd