Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)
Missing values are prevalent in agronomy datasets and need consideration to ensure the applicability of statistical methods and avoid bias in treating them. Previous studies indicate that multiple imputation is more effective than single imputation, with Principal Component Analysis (PCA)-based meth...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | en |
| Published: |
Penerbit Universiti Kebangsaan Malaysia
2025
|
| Online Access: | http://journalarticle.ukm.my/26407/1/Paper_1%20-.pdf http://journalarticle.ukm.my/26407/ https://www.ukm.my/jqma/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1855615325531275264 |
|---|---|
| author | Rahimah Sallehuddin, Norshahida Shaadan, |
| author_facet | Rahimah Sallehuddin, Norshahida Shaadan, |
| author_sort | Rahimah Sallehuddin, |
| building | Tun Sri Lanang Library |
| collection | Institutional Repository |
| content_provider | Universiti Kebangsaan Malaysia |
| content_source | UKM Journal Article Repository |
| continent | Asia |
| country | Malaysia |
| description | Missing values are prevalent in agronomy datasets and need consideration to ensure the applicability of statistical methods and avoid bias in treating them. Previous studies indicate that multiple imputation is more effective than single imputation, with Principal Component Analysis (PCA)-based methods effectively handling multicollinearity in multivariate data. However, such approaches are rarely applied to agronomy data, hence there is a need to assess their performance to add knowledge in the area. This study evaluates the performance of two PCA-based multiple imputation approaches on missing multivariate agronomy data: multiple imputation using regularised PCA through bootstrap procedure (BootMI-REM-PCA) and multiple imputation using regularised PCA through Bayesian procedure (BayesMI-REM-PCA). The data were obtained from the Department of Agriculture Sarawak. A simulation study was conducted using 500 simulated datasets at 5%, 10%, and 20% missingness. Results showed comparable performance between BootMI-REM-PCA and BayesMI-REM-PCA at 5% missingness, with equal coefficient of determination (R²) values of 0.998, while BootMI-REM-PCA exhibited slightly lower root mean squared error (RMSE) of 1.527 and mean absolute error (MAE) of 0.160. However, BayesMI-REM-PCA outperformed at higher missing rates, achieving the lowest RMSE (2.238 at 10% and 3.051 at 20%) and MAE (0.315 at 10% and 0.601 at 20%), along with the highest R² values of 0.996 and 0.993, respectively. While imputation accuracy declines as missing data increases, BayesMI-REM-PCA preserves the characteristics of real data. The findings are expected to help agricultural scientists and researchers prepare high-quality data for accurate analysis. |
| format | Article |
| id | my-ukm.journal.26407 |
| institution | Universiti Kebangsaan Malaysia |
| language | en |
| publishDate | 2025 |
| publisher | Penerbit Universiti Kebangsaan Malaysia |
| record_format | eprints |
| spelling | my-ukm.journal.264072026-01-19T02:54:41Z http://journalarticle.ukm.my/26407/ Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) Rahimah Sallehuddin, Norshahida Shaadan, Missing values are prevalent in agronomy datasets and need consideration to ensure the applicability of statistical methods and avoid bias in treating them. Previous studies indicate that multiple imputation is more effective than single imputation, with Principal Component Analysis (PCA)-based methods effectively handling multicollinearity in multivariate data. However, such approaches are rarely applied to agronomy data, hence there is a need to assess their performance to add knowledge in the area. This study evaluates the performance of two PCA-based multiple imputation approaches on missing multivariate agronomy data: multiple imputation using regularised PCA through bootstrap procedure (BootMI-REM-PCA) and multiple imputation using regularised PCA through Bayesian procedure (BayesMI-REM-PCA). The data were obtained from the Department of Agriculture Sarawak. A simulation study was conducted using 500 simulated datasets at 5%, 10%, and 20% missingness. Results showed comparable performance between BootMI-REM-PCA and BayesMI-REM-PCA at 5% missingness, with equal coefficient of determination (R²) values of 0.998, while BootMI-REM-PCA exhibited slightly lower root mean squared error (RMSE) of 1.527 and mean absolute error (MAE) of 0.160. However, BayesMI-REM-PCA outperformed at higher missing rates, achieving the lowest RMSE (2.238 at 10% and 3.051 at 20%) and MAE (0.315 at 10% and 0.601 at 20%), along with the highest R² values of 0.996 and 0.993, respectively. While imputation accuracy declines as missing data increases, BayesMI-REM-PCA preserves the characteristics of real data. The findings are expected to help agricultural scientists and researchers prepare high-quality data for accurate analysis. Penerbit Universiti Kebangsaan Malaysia 2025-09 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/26407/1/Paper_1%20-.pdf Rahimah Sallehuddin, and Norshahida Shaadan, (2025) Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian). Journal of Quality Measurement and Analysis, 21 (3). pp. 1-19. ISSN 2600-8602 https://www.ukm.my/jqma/ |
| spellingShingle | Rahimah Sallehuddin, Norshahida Shaadan, Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) |
| title | Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) |
| title_full | Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) |
| title_fullStr | Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) |
| title_full_unstemmed | Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) |
| title_short | Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) |
| title_sort | missing values treatment in agronomy dataset using pca-based multiple imputation (bootstrap versus bayesian) |
| url | http://journalarticle.ukm.my/26407/1/Paper_1%20-.pdf http://journalarticle.ukm.my/26407/ https://www.ukm.my/jqma/ |
| url_provider | http://journalarticle.ukm.my/ |
