A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2016
|
Online Access: | http://psasir.upm.edu.my/id/eprint/69313/1/FSKTM%202016%204%20IR.pdf http://psasir.upm.edu.my/id/eprint/69313/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In supervised learning, class imbalanced data set is a state where the class distribution
is not uniform among the classes. Many classifiers fail to properly identify pattern that
belongs to minority class due to most of those classifiers are built in order to minimize
error rate. Hence, a biased classification model is highly anticipated as higher accuracy
can always be represented by majority class.
There are two methods in dealing with imbalanced classification problem, which are
based on data or algorithmic level. Data level based methods are meant to solve the
imbalanced classification problem based on the idea of making both classes equal in
number. However, by changing the distribution of both classes, the original classes distribution
that are followed by that particular data will be violated. Algorithmic level
based methods however are based on introducing new optimization task to improve the
minority class classification rate, without changing the data characteristics. Nevertheless,
the optimization task requires specific care in order to prevent the issue of overfitting
classification model.
Therefore, a new classifier based on genetic programming (GP) and support vector machine
(SVM) is proposed in this thesis in order to solve the imbalanced classification
problem without changing the data properties. The idea is to use GP to optimize the SVM
decision function such that the minority class classification rate is increased without sacrificing
the accuracy rate for both classes. In addition, the classifier is also optimized
such that it has a good generalization property. The main keys of the new classifier are
based on the new kernel method, new learning metric and a new optimization algorithm
in order to optimize the SVM decision function. The proposed classifier is called Support
Vector Genetic Programming Machine, SVGPM.
In order to evaluate the performance of SVGPM against current methods in solving imbalanced classification task, three experiments are conducted such as on selected standard
class imbalanced benchmark data sets, intrusion detection system (IDS) data set
and remote sensing data set. The SVGPM performance is compared against SVM and
cost-sensitive SVM due to the superiority of SVM in dealing with imbalanced classification
problem. The second experiment is by evaluating the SVGPM performance on
detecting anomalous rare attacks from network intrusion data set. The SVGPM performance
is compared against current methods in developing a prediction model for IDS. In
the third experiment, SVGPM is evaluated on wilt disease data set from remote sensing
study, to identify wilt diseased trees in high-resolution image. The SVGPM performance
is compared against the previously proposed methods in mapping the regions that are
covered by wilt diseased trees in Japan.
The carried out experimentation shown that SVGPM gives a very good classification rate
in classifying minority class without sacrificing the accuracy rate for both classes. This
is because, in the training stage, the introduced optimization task in SVGPM ensures that
each minority class example is generalized into one learning concept and both classification
rate for majority and minority classes are similar. |
---|