The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin
The complete rainfall dataset is very important in representing the climatological characteristics precisely, especially for hydrological and meteorological studies. It is also contributed to effective and efficient environmental management. However, the rainfall data is highly vulnerable to the mis...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://ir.uitm.edu.my/id/eprint/60924/1/60924.pdf https://ir.uitm.edu.my/id/eprint/60924/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The complete rainfall dataset is very important in representing the climatological characteristics precisely, especially for hydrological and meteorological studies. It is also contributed to effective and efficient environmental management. However, the rainfall data is highly vulnerable to the missing problem due to the dynamic process of the climatic variable. Furthermore, the data is exposed to the seasonal activities that could contribute to the uncertainty and irregularity variations in the rainfall amount which will cause the presence of outliers in the dataset. These situations will affect the quality of the rainfall dataset and subsequently provide inaccurate information to the users. Concerning this situation, this study attempts to develop a practical and reliable approach to treat the missing values in the effort to provide a good quality dataset for the public domain. Spatial estimation method, i.e. normal ratio method was considered in this study to estimate the missing rainfall data. Various efforts were proposed to improve the performance of the method, however, there are lacking works on robustifying the method so that it can perform well for the dataset that contains outliers. Therefore, this study aims to propose the enhancement of normal ratio methods for imputing the missing values in the daily rainfall dataset with outliers. The robust statistics (i.e. trimmed mean, median, and geometric median) were adopted in the proposed methods to make them less affected by the outliers. The normal ratio method was commonly implemented through single imputation approach, but this approach encounters with the limitation of not considering uncertainty in missing values. Thus, this study has proposed a multiple imputation approach based on block bootstrap to overcome the limitation of single imputation approach as well as improving the performance of the existing multiple imputation approach incorporated in Amelia package. Block bootstrap was firstly introduced in the proposed multiple imputation approach (named as NRMI-Bboot) to enhance the performance when dealing with the rainfall time series. The performance of each estimation method was evaluated based on five performance criteria at six different levels of missing data (5%, 10%, 15%. 20%, 25%, and 30%) and three levels of outlying data (5%, 10%, and 15%) that have been created in the dataset. Complete 40 years daily rainfall data from 22 meteorology stations were considered for the analysis purpose. Four target stations were selected as the representative of the main regions in Peninsular Malaysia (northwest, east, west, and southwest). The capability of the estimation methods was further verified using distribution fitting. The adoption of the robust statistics in the proposed estimation methods associated with the NRMI-Bboot approach has provided an improvement to the estimation results, especially when dealing with the dataset that contains extreme outliers. The block bootstrap ensured that the original rainfall time series structure was preserved within each monsoon block and consequently produced more accurate estimation results. This indicates the advantages of the proposed estimation methods and multiple imputation approach in their role of providing accurate imputed values for missingness in Peninsular Malaysian daily rainfall dataset. |
---|