A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets

Outlier detection and classification algorithms play a critical role in statistical analysis. The reweighted fast consistent and high breakdown point (RFCH) estimator is an outlier-resistant estimator of multivariate location and dispersion. Still, some difficulties hamper the application of the RFC...

Full description

Saved in:
Bibliographic Details
Main Authors: A. Baba, Ishaq, Midi, Habshah, June, Leong W., Ibragimov, Gafurjan
Format: Article
Language:English
Published: Elsevier 2024
Online Access:http://psasir.upm.edu.my/id/eprint/112070/1/1-s2.0-S2772662224000286-main.pdf
http://psasir.upm.edu.my/id/eprint/112070/
https://www.sciencedirect.com/science/article/pii/S2772662224000286?via%3Dihub
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Outlier detection and classification algorithms play a critical role in statistical analysis. The reweighted fast consistent and high breakdown point (RFCH) estimator is an outlier-resistant estimator of multivariate location and dispersion. Still, some difficulties hamper the application of the RFCH in high-dimensional settings. One main difficulty is that the RFCH cannot be applied when the dimension exceeds the sample size. We propose a modified reweighted fast consistent and high breakdown point (MRFCH) estimator to make it applicable to high-dimensional settings. The basic idea of our proposed method is to modify the Mahalanobis distance so that it uses only the diagonal elements of the scatter matrix in the computation of the RFCH algorithm. The proposed method preserves the robustness properties of the RFCH estimator. As a result, we achieve a robust and efficient high-dimensional procedure for computing location and scatter matrix estimates and a powerful outlier detection method. One of the main advantages of our proposed procedure over the existing RFCH is that it can be applied to both low and high-dimensional datasets. Based on the real-life datasets and simulation study, our proposed method showed promising results irrespective of sample size, dimensions, amount of contamination, computational time, and distance of the contamination. Thus, the new proposed algorithm can be applied to solve the problem of regression outliers in high-dimensional data (HDD) and serve as a better alternative to the minimum regularized covariance determinant (MRCD) estimator. © 2024 The Author(s)