Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction

Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models wit...

Full description

Saved in:
Bibliographic Details
Main Authors: Mat Radzi, Siti Fairuz, Abdul Karim, Muhammad Khalis, Saripan, M. Iqbal, Abd Rahman, Mohd Amiruddin, Che Isa, Iza Nurzawani, Ibahim, Mohammad Johari
Format: Article
Language:English
Published: Multidisciplinary Digital Publishing Institute 2021
Online Access:http://psasir.upm.edu.my/id/eprint/97584/1/ABSTRACT.pdf
http://psasir.upm.edu.my/id/eprint/97584/
https://www.mdpi.com/2075-4426/11/10/978
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.97584
record_format eprints
spelling my.upm.eprints.975842022-07-25T02:28:11Z http://psasir.upm.edu.my/id/eprint/97584/ Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction Mat Radzi, Siti Fairuz Abdul Karim, Muhammad Khalis Saripan, M. Iqbal Abd Rahman, Mohd Amiruddin Che Isa, Iza Nurzawani Ibahim, Mohammad Johari Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis. Multidisciplinary Digital Publishing Institute 2021 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/97584/1/ABSTRACT.pdf Mat Radzi, Siti Fairuz and Abdul Karim, Muhammad Khalis and Saripan, M. Iqbal and Abd Rahman, Mohd Amiruddin and Che Isa, Iza Nurzawani and Ibahim, Mohammad Johari (2021) Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction. Journal of Personalized Medicine, 11 (10). pp. 1-12. ISSN 2075-4426 https://www.mdpi.com/2075-4426/11/10/978 10.3390/jpm11100978
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis.
format Article
author Mat Radzi, Siti Fairuz
Abdul Karim, Muhammad Khalis
Saripan, M. Iqbal
Abd Rahman, Mohd Amiruddin
Che Isa, Iza Nurzawani
Ibahim, Mohammad Johari
spellingShingle Mat Radzi, Siti Fairuz
Abdul Karim, Muhammad Khalis
Saripan, M. Iqbal
Abd Rahman, Mohd Amiruddin
Che Isa, Iza Nurzawani
Ibahim, Mohammad Johari
Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction
author_facet Mat Radzi, Siti Fairuz
Abdul Karim, Muhammad Khalis
Saripan, M. Iqbal
Abd Rahman, Mohd Amiruddin
Che Isa, Iza Nurzawani
Ibahim, Mohammad Johari
author_sort Mat Radzi, Siti Fairuz
title Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction
title_short Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction
title_full Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction
title_fullStr Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction
title_full_unstemmed Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction
title_sort hyperparameter tuning and pipeline optimization via grid search method and tree-based automl in breast cancer prediction
publisher Multidisciplinary Digital Publishing Institute
publishDate 2021
url http://psasir.upm.edu.my/id/eprint/97584/1/ABSTRACT.pdf
http://psasir.upm.edu.my/id/eprint/97584/
https://www.mdpi.com/2075-4426/11/10/978
_version_ 1739829914202275840
score 13.211869