Modified boxplot and stairboxplot for generalized extreme value distribution
A boxplot is an exploratory data analysis tool for a compact distributional summary of a univariate dataset. It is designed to recognised all typical observations and displays the location, spread, skewness and the tail of the data. When the dataset is skewed such as extreme data, the precisio...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2017
|
Online Access: | http://psasir.upm.edu.my/id/eprint/69421/1/IPM%202018%203%20IR.pdf http://psasir.upm.edu.my/id/eprint/69421/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A boxplot is an exploratory data analysis tool for a compact distributional summary of
a univariate dataset. It is designed to recognised all typical observations and displays
the location, spread, skewness and the tail of the data. When the dataset is skewed
such as extreme data, the precision of boxplot functionalities is less reliable and inaccurate.
Many observations from extreme data were erroneously marked as outliers by
the classical boxplot methods.
The Tukey’s classical and Hubert’s adjusted boxplots were utilize in the study based on
outside rate per sample and a propose measure of fence sensitivity ratio to observe the
suitability of the methods according to a simulation process from Generalized Extreme
Value distribution. The adjusted method improves the classical method in extreme
data capture but not sufficiently optimize to achieve the bench mark requirement in the
literature.
The modified boxplot has been proposed with a fence adjustment of the existing boxplot
method using the Bowley coefficient. The fence position was considered as a response
to skewness in the simulated extreme data from GEV distribution and then fitted
with resistance fit linear regression model. The propose fence adjustment enhance the
boxplot to detect all atypical observations without any parametric assumption about an
extreme data. The new boxplot displays some additional features other than the classical
one such as a quantile region for the parameters of Generalized Extreme Value
distribution in fitting an extreme data.
The modification of the entire boxplot display is also proposed as stairboxplot with
combined features of boxplot, histogram and a dot plot. The stairboxplot divides the
data points of a sample into four portions according to the range of the data set, such that the individual points are inscribed in their respective range levels. However, stairboxplot
displays each observation according to an introduce measure of outlyingness
of a point called stairboxplot outlyingness.
The main findings and contributions in both modified boxplot and stairboxplot can
generally be attributed to the enhancement of quality of a dataset by highlighting inconsistent
observations from GEV distribution’s modelling framework and diagnostic
visualisation of extreme data to gain immediate information such as skewness, quantile
estimate of GEV parameters region and data points display according to outlyingness. |
---|