Central double cross-validation for estimating parameters in regression models
The ridge regression, lasso, elastic net, forward stagewise regression and the least angle regression require a solution path and tuning parameter, λ, to estimate the coefficient vector. Therefore, it is crucial to find the ideal λ. Cross-validation (CV) is the most widely utilized method for choosi...
Saved in:
主要作者: | |
---|---|
格式: | Thesis |
語言: | English |
出版: |
2016
|
主題: | |
在線閱讀: | http://eprints.utm.my/id/eprint/80959/2/ChyeRouShiMFS2016.pdf http://eprints.utm.my/id/eprint/80959/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:120286 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
id |
my.utm.80959 |
---|---|
record_format |
eprints |
spelling |
my.utm.809592019-07-24T00:13:24Z http://eprints.utm.my/id/eprint/80959/ Central double cross-validation for estimating parameters in regression models Chye, Rou Shi QA Mathematics The ridge regression, lasso, elastic net, forward stagewise regression and the least angle regression require a solution path and tuning parameter, λ, to estimate the coefficient vector. Therefore, it is crucial to find the ideal λ. Cross-validation (CV) is the most widely utilized method for choosing the ideal tuning parameter from the solution path. CV is essentially the breaking down of the original sample into two parts. One part is used to develop the regression equation. The regression equation is then applied to the other part to evaluate the risk of every model. Consequently, the final model is the model with smallest estimated risk. However, CV does not provide consistent results because it has overfitting and underfitting effects during the model selection. In the present study, a new method for estimating parameter in best-subset regression called central double cross-validation (CDCV) is proposed. In this method, the CV is run twice with different number of folds. Therefore, CDCV maximizes the usage of available data, enhances the model selection performance and builds a new stable CV curve. The final model with an error of less than ?? standard error above the smallest CV error is chosen. The CDCV was compared to existing CV methods in determining the correct model via a simulation study with different sample size and correlation settings. Simulation study indicates that the proposed CDCV method has the highest percentage of obtaining the right model and the lowest Bayesian information criterion (BIC) value across multiple simulated study settings. The results showed that, CDCV has the ability to select the right model correctly and prevent the model from underfitting and overfitting. Therefore, CDCV is recommended as a good alternative to the existing methods in the simulation settings. 2016-07 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/80959/2/ChyeRouShiMFS2016.pdf Chye, Rou Shi (2016) Central double cross-validation for estimating parameters in regression models. Masters thesis, Universiti Teknologi Malaysia, Faculty of Science. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:120286 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Chye, Rou Shi Central double cross-validation for estimating parameters in regression models |
description |
The ridge regression, lasso, elastic net, forward stagewise regression and the least angle regression require a solution path and tuning parameter, λ, to estimate the coefficient vector. Therefore, it is crucial to find the ideal λ. Cross-validation (CV) is the most widely utilized method for choosing the ideal tuning parameter from the solution path. CV is essentially the breaking down of the original sample into two parts. One part is used to develop the regression equation. The regression equation is then applied to the other part to evaluate the risk of every model. Consequently, the final model is the model with smallest estimated risk. However, CV does not provide consistent results because it has overfitting and underfitting effects during the model selection. In the present study, a new method for estimating parameter in best-subset regression called central double cross-validation (CDCV) is proposed. In this method, the CV is run twice with different number of folds. Therefore, CDCV maximizes the usage of available data, enhances the model selection performance and builds a new stable CV curve. The final model with an error of less than ?? standard error above the smallest CV error is chosen. The CDCV was compared to existing CV methods in determining the correct model via a simulation study with different sample size and correlation settings. Simulation study indicates that the proposed CDCV method has the highest percentage of obtaining the right model and the lowest Bayesian information criterion (BIC) value across multiple simulated study settings. The results showed that, CDCV has the ability to select the right model correctly and prevent the model from underfitting and overfitting. Therefore, CDCV is recommended as a good alternative to the existing methods in the simulation settings. |
format |
Thesis |
author |
Chye, Rou Shi |
author_facet |
Chye, Rou Shi |
author_sort |
Chye, Rou Shi |
title |
Central double cross-validation for estimating parameters in regression models |
title_short |
Central double cross-validation for estimating parameters in regression models |
title_full |
Central double cross-validation for estimating parameters in regression models |
title_fullStr |
Central double cross-validation for estimating parameters in regression models |
title_full_unstemmed |
Central double cross-validation for estimating parameters in regression models |
title_sort |
central double cross-validation for estimating parameters in regression models |
publishDate |
2016 |
url |
http://eprints.utm.my/id/eprint/80959/2/ChyeRouShiMFS2016.pdf http://eprints.utm.my/id/eprint/80959/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:120286 |
_version_ |
1643658567577763840 |
score |
13.251813 |