Robust spatial diagnostic method and parameter estimation for spatial big data regression model

The existing spatial data compression method, namely the Adaptive Spatial Compression Clustering (ASDC) is a very potent method of compressing big data. However, the presence of global outliers in the spatial data affects the formation of spatial dispersion function which subsequently affects the...

Full description

Saved in:
Bibliographic Details
Main Author: Ali, Mohammed Baba
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104720/1/MOHAMMED%20BABA%20ALI%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/104720/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The existing spatial data compression method, namely the Adaptive Spatial Compression Clustering (ASDC) is a very potent method of compressing big data. However, the presence of global outliers in the spatial data affects the formation of spatial dispersion function which subsequently affects the outcome of the spectral clustering; this, in effect, affects spatial contiguity. Hence, a new robust spatial compression technique, which we call Outlier Resistant Adaptive Spatial Clustering (ORASDC) is proposed. Simulation results of synthetic spatial fields and real data application reveal that the proposed method is worthwhile in treating the effect of outliers with over 99% region of similarity retained and over 90% of data similarity maintained. Further research may be carried out to improving the processing speed of the ORASDC and to determining the optimum number of clusters that correspond to a specific data size. The score statistics (Sci) is formulated to identify spatial outliers in big data. Nonetheless, the method not only suffers from masking and swamping effects, but also takes long computational running time. To rectify this problem, a new diag nostic measure that adopts location adjacency to construct spatial weights, metric distance reciprocal (MDR) and exponential weight (EW), are developed. Difference between spatial residuals are calibrated to incorporate adjacency effect into spatial outlier residual. Results of simulations in large sample sizes have shown remarkable performance of the proposed methods where both diagnostics measures successfully detect spatial outliers with minimum swamping effect. Applications of our methods to real data have also shown good performance. This thesis also concerned on the establishment of diagnostic measures for the identification of spatial influential observations (IOs), which are outliers in the x and y directions of spatial regression models. Some of the classical techniques of identification of IOs have been adapted to spatial models. Nonetheless, those adapted methods fail to correctly identify the IOs and show high swamping and masking effects. Thus, we propose a new measure of spatial studentized prediction residuals that incorporate spatial information on the dependent variable and residual. To the best of our knowledge, no research is done on the classification of spatial observations into regular observations, vertical outliers, good and bad leverage points. Hence, the ISRs−Posi and ESRs−Posi plots are established to close the gap in the literature. The results signify that the ESRs−Posi plot, followed by the ISRs−Posi plot were very successful in classifying observations into the correct groups. The numerical examples and simulation study have shown that the proposed methods possess almost 100% accurate detection and 0% swamping, against their competitors that have lower detection rates and higher swamping rates. Outliers in spatial applications usually keep vital information about the model; a situation that calls for method that is effective in accommodating the spatial outliers in a special way. Variance Shift Outlier Model (VSOM) in the classical regression is promising in keeping such observations in the model by downweighting their effect in the model. To date, no research has been done to obtain spatial representation of VSOM. To fill the gap in the literature, we formulated the VSOM in the spatial regression model which we call Spatial Variance Shift Outlier Model (SVSOM) using the Residual Maximum Likelihood (REML). Weights based on the detected outliers are used to accommodate the spatial outliers via revised model with the help of the SVSOM. The results of simulation study and real data set indicate that our proposed method has significant improvement in parameter estimation and outlier accommodation.