A comparison of various imputation methods for missing values in air quality data

This paper presents various imputation methods for air quality data specifically in Malaysia. The main objective was to select the best method of imputation and to compare whether there was any difference in the methods used between stations in Peninsular Malaysia. Missing data for various cases are...

全面介紹

Saved in:
書目詳細資料
Main Authors: Nuryazmin Ahmat Zainuri,, Abdul Aziz Jemain,, Nora Muda,
格式: Article
語言:English
出版: Universiti Kebangsaan Malaysia 2015
在線閱讀:http://journalarticle.ukm.my/8488/1/17_NuryAzmin.pdf
http://journalarticle.ukm.my/8488/
http://www.ukm.my/jsm/
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:This paper presents various imputation methods for air quality data specifically in Malaysia. The main objective was to select the best method of imputation and to compare whether there was any difference in the methods used between stations in Peninsular Malaysia. Missing data for various cases are randomly simulated with 5, 10, 15, 20, 25 and 30% missing. Six methods used in this paper were mean and median substitution, expectation-maximization (EM) method, singular value decomposition (SVD), K-nearest neighbour (KNN) method and sequential K-nearest neighbour (SKNN) method. The performance of the imputations is compared using the performance indicator: The correlation coefficient (R), the index of agreement (d) and the mean absolute error (MAE). Based on the result obtained, it can be concluded that EM, KNN and SKNN are the three best methods. The same result are obtained for all the eight monitoring station used in this study.