A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem

In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Pozi, Muhammad Syafiq
Format: Thesis
Language:English
Published: 2016
Online Access:http://psasir.upm.edu.my/id/eprint/69313/1/FSKTM%202016%204%20IR.pdf
http://psasir.upm.edu.my/id/eprint/69313/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.69313
record_format eprints
spelling my.upm.eprints.693132019-06-28T08:14:21Z http://psasir.upm.edu.my/id/eprint/69313/ A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem Mohd Pozi, Muhammad Syafiq In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased classification model is highly anticipated as higher accuracy can always be represented by majority class. There are two methods in dealing with imbalanced classification problem, which are based on data or algorithmic level. Data level based methods are meant to solve the imbalanced classification problem based on the idea of making both classes equal in number. However, by changing the distribution of both classes, the original classes distribution that are followed by that particular data will be violated. Algorithmic level based methods however are based on introducing new optimization task to improve the minority class classification rate, without changing the data characteristics. Nevertheless, the optimization task requires specific care in order to prevent the issue of overfitting classification model. Therefore, a new classifier based on genetic programming (GP) and support vector machine (SVM) is proposed in this thesis in order to solve the imbalanced classification problem without changing the data properties. The idea is to use GP to optimize the SVM decision function such that the minority class classification rate is increased without sacrificing the accuracy rate for both classes. In addition, the classifier is also optimized such that it has a good generalization property. The main keys of the new classifier are based on the new kernel method, new learning metric and a new optimization algorithm in order to optimize the SVM decision function. The proposed classifier is called Support Vector Genetic Programming Machine, SVGPM. In order to evaluate the performance of SVGPM against current methods in solving imbalanced classification task, three experiments are conducted such as on selected standard class imbalanced benchmark data sets, intrusion detection system (IDS) data set and remote sensing data set. The SVGPM performance is compared against SVM and cost-sensitive SVM due to the superiority of SVM in dealing with imbalanced classification problem. The second experiment is by evaluating the SVGPM performance on detecting anomalous rare attacks from network intrusion data set. The SVGPM performance is compared against current methods in developing a prediction model for IDS. In the third experiment, SVGPM is evaluated on wilt disease data set from remote sensing study, to identify wilt diseased trees in high-resolution image. The SVGPM performance is compared against the previously proposed methods in mapping the regions that are covered by wilt diseased trees in Japan. The carried out experimentation shown that SVGPM gives a very good classification rate in classifying minority class without sacrificing the accuracy rate for both classes. This is because, in the training stage, the introduced optimization task in SVGPM ensures that each minority class example is generalized into one learning concept and both classification rate for majority and minority classes are similar. 2016-02 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/69313/1/FSKTM%202016%204%20IR.pdf Mohd Pozi, Muhammad Syafiq (2016) A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem. PhD thesis, Universiti Putra Malaysia.
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased classification model is highly anticipated as higher accuracy can always be represented by majority class. There are two methods in dealing with imbalanced classification problem, which are based on data or algorithmic level. Data level based methods are meant to solve the imbalanced classification problem based on the idea of making both classes equal in number. However, by changing the distribution of both classes, the original classes distribution that are followed by that particular data will be violated. Algorithmic level based methods however are based on introducing new optimization task to improve the minority class classification rate, without changing the data characteristics. Nevertheless, the optimization task requires specific care in order to prevent the issue of overfitting classification model. Therefore, a new classifier based on genetic programming (GP) and support vector machine (SVM) is proposed in this thesis in order to solve the imbalanced classification problem without changing the data properties. The idea is to use GP to optimize the SVM decision function such that the minority class classification rate is increased without sacrificing the accuracy rate for both classes. In addition, the classifier is also optimized such that it has a good generalization property. The main keys of the new classifier are based on the new kernel method, new learning metric and a new optimization algorithm in order to optimize the SVM decision function. The proposed classifier is called Support Vector Genetic Programming Machine, SVGPM. In order to evaluate the performance of SVGPM against current methods in solving imbalanced classification task, three experiments are conducted such as on selected standard class imbalanced benchmark data sets, intrusion detection system (IDS) data set and remote sensing data set. The SVGPM performance is compared against SVM and cost-sensitive SVM due to the superiority of SVM in dealing with imbalanced classification problem. The second experiment is by evaluating the SVGPM performance on detecting anomalous rare attacks from network intrusion data set. The SVGPM performance is compared against current methods in developing a prediction model for IDS. In the third experiment, SVGPM is evaluated on wilt disease data set from remote sensing study, to identify wilt diseased trees in high-resolution image. The SVGPM performance is compared against the previously proposed methods in mapping the regions that are covered by wilt diseased trees in Japan. The carried out experimentation shown that SVGPM gives a very good classification rate in classifying minority class without sacrificing the accuracy rate for both classes. This is because, in the training stage, the introduced optimization task in SVGPM ensures that each minority class example is generalized into one learning concept and both classification rate for majority and minority classes are similar.
format Thesis
author Mohd Pozi, Muhammad Syafiq
spellingShingle Mohd Pozi, Muhammad Syafiq
A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
author_facet Mohd Pozi, Muhammad Syafiq
author_sort Mohd Pozi, Muhammad Syafiq
title A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_short A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_full A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_fullStr A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_full_unstemmed A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_sort new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
publishDate 2016
url http://psasir.upm.edu.my/id/eprint/69313/1/FSKTM%202016%204%20IR.pdf
http://psasir.upm.edu.my/id/eprint/69313/
_version_ 1643839457001996288
score 13.211869