Comparative analysis of imputation methods for missing environmental data: A case study on ozone concentrations

Handling missing values is crucial to environmental data analysis since missing datasets can lead to biassed results. Using Weibull distributions, this study compared six single-imputation methods (mean, median, mean-before-after (MBA), cubic interpolation, linear interpolation, last observation car...

Full description

Saved in:
Bibliographic Details
Main Authors: Nurliyana Juhan, Siti Noradiah Jamaludin, Yong Zulina Zubairi, Dg Siti Nurisya Sahirah Ag Isha, Nur Idayu Ah Khaliludin
Format: Article
Language:en
Published: Penerbit Akademi Baru 2025
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/45417/1/FULLTEXT.pdf
https://eprints.ums.edu.my/id/eprint/45417/
https://doi.org/10.37934/ard.134.1.6376
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Handling missing values is crucial to environmental data analysis since missing datasets can lead to biassed results. Using Weibull distributions, this study compared six single-imputation methods (mean, median, mean-before-after (MBA), cubic interpolation, linear interpolation, last observation carried forward (LOCF)) for estimating missing ozone concentration data in Petaling Jaya, Selangor. The present study simulated data for sample sizes of 50 and 150 with varying missing value percentages (5%, 10%, 15%, 20%, and 25%). The performance of each imputation method was evaluated using prediction accuracy, root mean square error (RMSE) and mean absolute error (MAE). The findings suggested that the MBA approach outperformed all examined cases, followed by linear interpolation and LOCF. Conversely, cubic interpolation, mean, and median substitution approaches performed poorly, especially as the proportion of missing data increased. This study emphasizes the critical role of selecting appropriate imputation methods to enable accurate and trustworthy environmental data analysis. The findings can help researchers select efficient approaches for addressing missing values in air quality datasets, thus boosting the reliability of environmental studies.