Determination of the optimal number of PLS components based on the combination of cross-validation and RMD-MRCD-PCA weighting function

Partial least squares (PLS) regression is a very useful tool for the analysis of high dimensional data (HDD). Choosing the ideal number of PLS components is a vital step in developing the best model. The accuracy of the model will be affected if there are too many or too few PLS components being sel...

Full description

Saved in:
Bibliographic Details
Main Authors: Habshah Midi, Siti Zahariah Abdul Wahab, Azree Shahrel Ahmad Nazri
Format: Article
Language:en
Published: Penerbit Universiti Kebangsaan Malaysia 2025
Online Access:http://journalarticle.ukm.my/26523/1/SSS%2016.pdf
http://journalarticle.ukm.my/26523/
https://www.ukm.my/jsm/english_journals/vol54num11_2025/contentsVol54num11_2025.html
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Partial least squares (PLS) regression is a very useful tool for the analysis of high dimensional data (HDD). Choosing the ideal number of PLS components is a vital step in developing the best model. The accuracy of the model will be affected if there are too many or too few PLS components being selected. Numerous classical methods, such as the leave-one-out cross-validation (LOOCV) and K-fold cross-validation (K-FoldCV) are developed to determine the optimal number of PLS components. Nonetheless, they are easily affected by high leverage points (HLPs). Thus, robust cross validation techniques, denoted as RMD- MRCD-PCA-LOOCV and RMD-MRCD-PCA-K-FoldCV are proposed to remedy this problem. The results of the simulation study and real data set indicate that the proposed methods successfully select the appropriate number of PLS components. Keywords: High leverage points; leave-one-out cross validation; minimum regularized covariance determinant; partial least squares; principal component analysis.