A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
Outlier detection and classification algorithms play a critical role in statistical analysis. The reweighted fast consistent and high breakdown point (RFCH) estimator is an outlier-resistant estimator of multivariate location and dispersion. Still, some difficulties hamper the application of the RFC...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024
|
Online Access: | http://psasir.upm.edu.my/id/eprint/112070/1/1-s2.0-S2772662224000286-main.pdf http://psasir.upm.edu.my/id/eprint/112070/ https://www.sciencedirect.com/science/article/pii/S2772662224000286?via%3Dihub |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Outlier detection and classification algorithms play a critical role in statistical analysis. The reweighted fast consistent and high breakdown point (RFCH) estimator is an outlier-resistant estimator of multivariate location and dispersion. Still, some difficulties hamper the application of the RFCH in high-dimensional settings. One main difficulty is that the RFCH cannot be applied when the dimension exceeds the sample size. We propose a modified reweighted fast consistent and high breakdown point (MRFCH) estimator to make it applicable to high-dimensional settings. The basic idea of our proposed method is to modify the Mahalanobis distance so that it uses only the diagonal elements of the scatter matrix in the computation of the RFCH algorithm. The proposed method preserves the robustness properties of the RFCH estimator. As a result, we achieve a robust and efficient high-dimensional procedure for computing location and scatter matrix estimates and a powerful outlier detection method. One of the main advantages of our proposed procedure over the existing RFCH is that it can be applied to both low and high-dimensional datasets. Based on the real-life datasets and simulation study, our proposed method showed promising results irrespective of sample size, dimensions, amount of contamination, computational time, and distance of the contamination. Thus, the new proposed algorithm can be applied to solve the problem of regression outliers in high-dimensional data (HDD) and serve as a better alternative to the minimum regularized covariance determinant (MRCD) estimator. © 2024 The Author(s) |
---|