Embedded feature selection methods with high dimensionality for elastic net and logistic regression models
Feature selection and classification in high-dimensional data is a challenging problem in scientific research such as biology, medicine, and finance. In such data, highly correlated features and missing data often exist. Therefore, selecting informative features and adequate handling of missing valu...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/102313/1/AiedhMrisiAlharthiPFS2022.pdf.pdf http://eprints.utm.my/id/eprint/102313/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149202 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.102313 |
---|---|
record_format |
eprints |
spelling |
my.utm.1023132023-08-17T01:08:11Z http://eprints.utm.my/id/eprint/102313/ Embedded feature selection methods with high dimensionality for elastic net and logistic regression models Alharthi, Aiedh Mrisi QA Mathematics Feature selection and classification in high-dimensional data is a challenging problem in scientific research such as biology, medicine, and finance. In such data, highly correlated features and missing data often exist. Therefore, selecting informative features and adequate handling of missing values are significant to find an optimal model in terms of interpretability and prediction accuracy. In recent years, embedded feature selection methods, including penalized regression, have attracted many statisticians since these methods often obtain model estimates with higher prediction accuracy. Nevertheless, most penalized methods lack the consistency of feature selection, encouragement of grouping effects, and handling missing values when dealing with high-dimensional data. Hence, this study aims to improve the process of feature selection and handling of missing values by proposing several improvements in the penalized high-dimensional approaches. An alternative initial weight was introduced in the adaptive least absolute shrinkage and selection operator (LASSO) to improve the feature selection performance. Then, an initial ratio and adjusted variance weights inside the ??1-norm penalty of the adaptive elastic net are proposed to encourage the grouping effect. Furthermore, imputation penalized logistic regression with the adaptive LASSO approach was proposed to enhance the handling of missing values in high-dimensional data. Simulation studies with varying numbers of predictor variables, sample sizes, correlation coefficients, and the proportion of missing values were performed to evaluate the effectiveness of the proposed methods. The proposed adaptive LASSO methods were also compared with LASSO and other versions of adaptive LASSO methods, while the proposed adaptive elastic net methods were compared with the existing elastic net and adaptive elastic net methods. The proposed methods were also applied to a chemometrics dataset and eight gene expression microarray datasets in which the number of genes (features) is more than the sample size. The results indicated that the proposed methods outperform their competitors in selecting the most relevant features and achieving higher classification accuracy, sensitivity, and specificity values. It also reduces dimensionality and selects the most helpful features for cancer classification, resulting in optimal models that concurrently perform feature selection and patient classification. On the other hand, the proposed adaptive elastic net method is shown superior to the other methods in terms of encouraging the group effect. In conclusion, this study shows that the proposed methods are appropriate for gene expression data classification and other high-dimensional data classification analyses. 2022 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/102313/1/AiedhMrisiAlharthiPFS2022.pdf.pdf Alharthi, Aiedh Mrisi (2022) Embedded feature selection methods with high dimensionality for elastic net and logistic regression models. PhD thesis, Universiti Teknologi Malaysia. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149202 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Alharthi, Aiedh Mrisi Embedded feature selection methods with high dimensionality for elastic net and logistic regression models |
description |
Feature selection and classification in high-dimensional data is a challenging problem in scientific research such as biology, medicine, and finance. In such data, highly correlated features and missing data often exist. Therefore, selecting informative features and adequate handling of missing values are significant to find an optimal model in terms of interpretability and prediction accuracy. In recent years, embedded feature selection methods, including penalized regression, have attracted many statisticians since these methods often obtain model estimates with higher prediction accuracy. Nevertheless, most penalized methods lack the consistency of feature selection, encouragement of grouping effects, and handling missing values when dealing with high-dimensional data. Hence, this study aims to improve the process of feature selection and handling of missing values by proposing several improvements in the penalized high-dimensional approaches. An alternative initial weight was introduced in the adaptive least absolute shrinkage and selection operator (LASSO) to improve the feature selection performance. Then, an initial ratio and adjusted variance weights inside the ??1-norm penalty of the adaptive elastic net are proposed to encourage the grouping effect. Furthermore, imputation penalized logistic regression with the adaptive LASSO approach was proposed to enhance the handling of missing values in high-dimensional data. Simulation studies with varying numbers of predictor variables, sample sizes, correlation coefficients, and the proportion of missing values were performed to evaluate the effectiveness of the proposed methods. The proposed adaptive LASSO methods were also compared with LASSO and other versions of adaptive LASSO methods, while the proposed adaptive elastic net methods were compared with the existing elastic net and adaptive elastic net methods. The proposed methods were also applied to a chemometrics dataset and eight gene expression microarray datasets in which the number of genes (features) is more than the sample size. The results indicated that the proposed methods outperform their competitors in selecting the most relevant features and achieving higher classification accuracy, sensitivity, and specificity values. It also reduces dimensionality and selects the most helpful features for cancer classification, resulting in optimal models that concurrently perform feature selection and patient classification. On the other hand, the proposed adaptive elastic net method is shown superior to the other methods in terms of encouraging the group effect. In conclusion, this study shows that the proposed methods are appropriate for gene expression data classification and other high-dimensional data classification analyses. |
format |
Thesis |
author |
Alharthi, Aiedh Mrisi |
author_facet |
Alharthi, Aiedh Mrisi |
author_sort |
Alharthi, Aiedh Mrisi |
title |
Embedded feature selection methods with high dimensionality for elastic net and logistic regression models |
title_short |
Embedded feature selection methods with high dimensionality for elastic net and logistic regression models |
title_full |
Embedded feature selection methods with high dimensionality for elastic net and logistic regression models |
title_fullStr |
Embedded feature selection methods with high dimensionality for elastic net and logistic regression models |
title_full_unstemmed |
Embedded feature selection methods with high dimensionality for elastic net and logistic regression models |
title_sort |
embedded feature selection methods with high dimensionality for elastic net and logistic regression models |
publishDate |
2022 |
url |
http://eprints.utm.my/id/eprint/102313/1/AiedhMrisiAlharthiPFS2022.pdf.pdf http://eprints.utm.my/id/eprint/102313/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149202 |
_version_ |
1775621968877322240 |
score |
13.211869 |