Imputation methods on daily PM10 data (2010-15)
Air pollution monitoring especially PM10 pollutant is very important since the air pollutant data originated from the continuous ambient air quality stations (CAAQS) usually had missing data due to the machine failure, routine maintenance and human error. In view of this fact, a study of PM10 imputa...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Innovative Scientific Information & Services Network
2019
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/76209/1/Prof%20K-2.pdf http://irep.iium.edu.my/76209/ https://www.isisn.org/BR16(SI-1)2019/306-310-16(SI)2019BR19-SI-05.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Air pollution monitoring especially PM10 pollutant is very important since the air pollutant data originated from the continuous ambient air quality stations (CAAQS) usually had missing data due to the machine failure, routine maintenance and human error. In view of this fact, a study of PM10 imputation method was performed with the objective to determine the coefficient of determination (R2) and root mean square error (RMSE) in order to portray the goodness of fit for all of the imputation methods used (mean substitution, nearest neighbour and expectation maximization based algorithm (EMB)). The results of R2 obtained for 5%, 10%, 15%, 25% and 40% proportion of missing data using nearest neighbor imputation methods are 0.9318, 0.8126, 0.6546, 0.5458 and 0.3946, while RMSE are 7.47, 12.27, 16.68, 19.13 and 21.76, respectively. Meanwhile, results of R2 obtained for 5%, 10%, 15%, 25% and 40% proportion of missing data using mean imputation methods are 0.9274, 0.8117, 0.6484, 0.5400 and 0.3910, while RMSE are 7.47, 12.36, 16.90, 19.13 and 22.07, respectively. In the meantime, the results of R2 for EMB imputation method applied at 5%, 10%, 15%, 25% and 40% proportion of missing data are 0.9084, 0.8468, 0.7530, 0.5791 and 0.5004, while RMSE are 8.58, 11.18, 14.20, 18.53 and 20.48, respectively. A measure of performances (R2 and RMSE) for each imputation methods decreased and increase respectively as the percentages of simulated missing data increases |
---|