A comparative analysis of missing data imputation techniques on sedimentation data

Sediment data pertains to various hydrological variables with complex sediment hydrodynamics such as sedimentation rates which are often incompletely presented. Thus, the availability of sedimentation data is of utmost necessity for data accessibility. A comparative analysis on the missing fine sedi...

Full description

Saved in:
Bibliographic Details
Main Authors: Loh, Wing Son, Lloyd, Ling, Chin, Ren Jie, Lai, Sai Hin, Loo, Kar Kuan, Seah, Choon Sen
Format: Article
Language:English
Published: Elsevier Ltd. 2024
Subjects:
Online Access:http://ir.unimas.my/id/eprint/44864/2/A%20comparative%20analysi.pdf
http://ir.unimas.my/id/eprint/44864/
https://www.sciencedirect.com/science/article/pii/S2090447924000923#:~:text=A%20comparative%20analysis%20on%20the,imputation%20(SI)%20and%20multiple%20imputation
https://doi.org/10.1016/j.asej.2024.102717
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.unimas.ir.44864
record_format eprints
spelling my.unimas.ir.448642024-05-27T03:31:07Z http://ir.unimas.my/id/eprint/44864/ A comparative analysis of missing data imputation techniques on sedimentation data Loh, Wing Son Lloyd, Ling Chin, Ren Jie Lai, Sai Hin Loo, Kar Kuan Seah, Choon Sen TA Engineering (General). Civil engineering (General) Sediment data pertains to various hydrological variables with complex sediment hydrodynamics such as sedimentation rates which are often incompletely presented. Thus, the availability of sedimentation data is of utmost necessity for data accessibility. A comparative analysis on the missing fine sediment data imputation performance was made based on four different techniques, namely the k-Nearest Neighbourhood (k-NN), Support Vector Regression (SVR), Multiple Regression (MR), and Artificial Neural Network (ANN), under the single imputation (SI) and multiple imputation (MI) regimes. Across different missing data proportions (10%-50%), the ANN demonstrated optimal results with consistent performance metrics recorded over both SI and MI regimes. For the highest missing data proportion (50%), the ANN presented the best imputation performance with a reported root mean squared error (RMSE) 0.000882, mean absolute error (MAE) 0.000595, coefficient of determination (R2 ) 71%, and Kling-Gupta Efficiency (KGE) 72%. The imputation performance ranking is as follows: ANN, SVR, MR, and k-NN. Elsevier Ltd. 2024 Article PeerReviewed text en http://ir.unimas.my/id/eprint/44864/2/A%20comparative%20analysi.pdf Loh, Wing Son and Lloyd, Ling and Chin, Ren Jie and Lai, Sai Hin and Loo, Kar Kuan and Seah, Choon Sen (2024) A comparative analysis of missing data imputation techniques on sedimentation data. Ain Shams Engineering Journal, 15 (6). pp. 1-20. ISSN 2090-4495 https://www.sciencedirect.com/science/article/pii/S2090447924000923#:~:text=A%20comparative%20analysis%20on%20the,imputation%20(SI)%20and%20multiple%20imputation https://doi.org/10.1016/j.asej.2024.102717
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic TA Engineering (General). Civil engineering (General)
spellingShingle TA Engineering (General). Civil engineering (General)
Loh, Wing Son
Lloyd, Ling
Chin, Ren Jie
Lai, Sai Hin
Loo, Kar Kuan
Seah, Choon Sen
A comparative analysis of missing data imputation techniques on sedimentation data
description Sediment data pertains to various hydrological variables with complex sediment hydrodynamics such as sedimentation rates which are often incompletely presented. Thus, the availability of sedimentation data is of utmost necessity for data accessibility. A comparative analysis on the missing fine sediment data imputation performance was made based on four different techniques, namely the k-Nearest Neighbourhood (k-NN), Support Vector Regression (SVR), Multiple Regression (MR), and Artificial Neural Network (ANN), under the single imputation (SI) and multiple imputation (MI) regimes. Across different missing data proportions (10%-50%), the ANN demonstrated optimal results with consistent performance metrics recorded over both SI and MI regimes. For the highest missing data proportion (50%), the ANN presented the best imputation performance with a reported root mean squared error (RMSE) 0.000882, mean absolute error (MAE) 0.000595, coefficient of determination (R2 ) 71%, and Kling-Gupta Efficiency (KGE) 72%. The imputation performance ranking is as follows: ANN, SVR, MR, and k-NN.
format Article
author Loh, Wing Son
Lloyd, Ling
Chin, Ren Jie
Lai, Sai Hin
Loo, Kar Kuan
Seah, Choon Sen
author_facet Loh, Wing Son
Lloyd, Ling
Chin, Ren Jie
Lai, Sai Hin
Loo, Kar Kuan
Seah, Choon Sen
author_sort Loh, Wing Son
title A comparative analysis of missing data imputation techniques on sedimentation data
title_short A comparative analysis of missing data imputation techniques on sedimentation data
title_full A comparative analysis of missing data imputation techniques on sedimentation data
title_fullStr A comparative analysis of missing data imputation techniques on sedimentation data
title_full_unstemmed A comparative analysis of missing data imputation techniques on sedimentation data
title_sort comparative analysis of missing data imputation techniques on sedimentation data
publisher Elsevier Ltd.
publishDate 2024
url http://ir.unimas.my/id/eprint/44864/2/A%20comparative%20analysi.pdf
http://ir.unimas.my/id/eprint/44864/
https://www.sciencedirect.com/science/article/pii/S2090447924000923#:~:text=A%20comparative%20analysis%20on%20the,imputation%20(SI)%20and%20multiple%20imputation
https://doi.org/10.1016/j.asej.2024.102717
_version_ 1800728212535246848
score 13.211869