Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)

Missing values are prevalent in agronomy datasets and need consideration to ensure the applicability of statistical methods and avoid bias in treating them. Previous studies indicate that multiple imputation is more effective than single imputation, with Principal Component Analysis (PCA)-based meth...

Full description

Saved in:
Bibliographic Details
Main Authors: Rahimah Sallehuddin, Norshahida Shaadan
Format: Article
Language:en
Published: Penerbit Universiti Kebangsaan Malaysia 2025
Online Access:http://journalarticle.ukm.my/26407/1/Paper_1%20-.pdf
http://journalarticle.ukm.my/26407/
https://www.ukm.my/jqma/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1855615325531275264
author Rahimah Sallehuddin,
Norshahida Shaadan,
author_facet Rahimah Sallehuddin,
Norshahida Shaadan,
author_sort Rahimah Sallehuddin,
building Tun Sri Lanang Library
collection Institutional Repository
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
continent Asia
country Malaysia
description Missing values are prevalent in agronomy datasets and need consideration to ensure the applicability of statistical methods and avoid bias in treating them. Previous studies indicate that multiple imputation is more effective than single imputation, with Principal Component Analysis (PCA)-based methods effectively handling multicollinearity in multivariate data. However, such approaches are rarely applied to agronomy data, hence there is a need to assess their performance to add knowledge in the area. This study evaluates the performance of two PCA-based multiple imputation approaches on missing multivariate agronomy data: multiple imputation using regularised PCA through bootstrap procedure (BootMI-REM-PCA) and multiple imputation using regularised PCA through Bayesian procedure (BayesMI-REM-PCA). The data were obtained from the Department of Agriculture Sarawak. A simulation study was conducted using 500 simulated datasets at 5%, 10%, and 20% missingness. Results showed comparable performance between BootMI-REM-PCA and BayesMI-REM-PCA at 5% missingness, with equal coefficient of determination (R²) values of 0.998, while BootMI-REM-PCA exhibited slightly lower root mean squared error (RMSE) of 1.527 and mean absolute error (MAE) of 0.160. However, BayesMI-REM-PCA outperformed at higher missing rates, achieving the lowest RMSE (2.238 at 10% and 3.051 at 20%) and MAE (0.315 at 10% and 0.601 at 20%), along with the highest R² values of 0.996 and 0.993, respectively. While imputation accuracy declines as missing data increases, BayesMI-REM-PCA preserves the characteristics of real data. The findings are expected to help agricultural scientists and researchers prepare high-quality data for accurate analysis.
format Article
id my-ukm.journal.26407
institution Universiti Kebangsaan Malaysia
language en
publishDate 2025
publisher Penerbit Universiti Kebangsaan Malaysia
record_format eprints
spelling my-ukm.journal.264072026-01-19T02:54:41Z http://journalarticle.ukm.my/26407/ Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian) Rahimah Sallehuddin, Norshahida Shaadan, Missing values are prevalent in agronomy datasets and need consideration to ensure the applicability of statistical methods and avoid bias in treating them. Previous studies indicate that multiple imputation is more effective than single imputation, with Principal Component Analysis (PCA)-based methods effectively handling multicollinearity in multivariate data. However, such approaches are rarely applied to agronomy data, hence there is a need to assess their performance to add knowledge in the area. This study evaluates the performance of two PCA-based multiple imputation approaches on missing multivariate agronomy data: multiple imputation using regularised PCA through bootstrap procedure (BootMI-REM-PCA) and multiple imputation using regularised PCA through Bayesian procedure (BayesMI-REM-PCA). The data were obtained from the Department of Agriculture Sarawak. A simulation study was conducted using 500 simulated datasets at 5%, 10%, and 20% missingness. Results showed comparable performance between BootMI-REM-PCA and BayesMI-REM-PCA at 5% missingness, with equal coefficient of determination (R²) values of 0.998, while BootMI-REM-PCA exhibited slightly lower root mean squared error (RMSE) of 1.527 and mean absolute error (MAE) of 0.160. However, BayesMI-REM-PCA outperformed at higher missing rates, achieving the lowest RMSE (2.238 at 10% and 3.051 at 20%) and MAE (0.315 at 10% and 0.601 at 20%), along with the highest R² values of 0.996 and 0.993, respectively. While imputation accuracy declines as missing data increases, BayesMI-REM-PCA preserves the characteristics of real data. The findings are expected to help agricultural scientists and researchers prepare high-quality data for accurate analysis. Penerbit Universiti Kebangsaan Malaysia 2025-09 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/26407/1/Paper_1%20-.pdf Rahimah Sallehuddin, and Norshahida Shaadan, (2025) Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian). Journal of Quality Measurement and Analysis, 21 (3). pp. 1-19. ISSN 2600-8602 https://www.ukm.my/jqma/
spellingShingle Rahimah Sallehuddin,
Norshahida Shaadan,
Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)
title Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)
title_full Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)
title_fullStr Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)
title_full_unstemmed Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)
title_short Missing values treatment in agronomy dataset using PCA-Based Multiple Imputation (Bootstrap versus Bayesian)
title_sort missing values treatment in agronomy dataset using pca-based multiple imputation (bootstrap versus bayesian)
url http://journalarticle.ukm.my/26407/1/Paper_1%20-.pdf
http://journalarticle.ukm.my/26407/
https://www.ukm.my/jqma/
url_provider http://journalarticle.ukm.my/