Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat

In practice, a balanced target class is rare. However, an imbalanced target class can be handled by resampling the original dataset, either by oversampling/upsampling or undersampling/downsampling. A popular upsampling technique is Synthetic Minority Over-sampling Technique (SMOTE). This technique i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Khamsan, Muhammad Muhaimin, Maskat, Ruhaila
Format:	Article
Language:	en
Published:	Penerbit UiTM 2019
Subjects:	Data processing
Online Access:	https://ir.uitm.edu.my/id/eprint/61448/1/61448.pdf https://ir.uitm.edu.my/id/eprint/61448/ https://mjoc.uitm.edu.my/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1833072111844851712
author	Khamsan, Muhammad Muhaimin Maskat, Ruhaila
author_facet	Khamsan, Muhammad Muhaimin Maskat, Ruhaila
author_sort	Khamsan, Muhammad Muhaimin
building	Tun Abdul Razak Library
collection	Institutional Repository
content_provider	Universiti Teknologi Mara
content_source	UiTM Institutional Repository
continent	Asia
country	Malaysia
description	In practice, a balanced target class is rare. However, an imbalanced target class can be handled by resampling the original dataset, either by oversampling/upsampling or undersampling/downsampling. A popular upsampling technique is Synthetic Minority Over-sampling Technique (SMOTE). This technique increases the minority class by generating synthetic class labels and assigned the class based on the K-Nearest Neighbour (K-NN). SMOTE upsampling can only upsample at most one minority class at a time, which means for a multiclass dataset, it needs to undergo multilayer SMOTE to balance the class label distribution. This paper aims to find a suitable method in handling imbalanced class using dataset from Fantasy Premier League (FPL) virtual player to predict price changes. The cleaned dataset has a highly imbalanced class distribution, where the frequency of “Price Remain Unchanged (PRU)” is higher than “Price Fall (PF)” and “Price Rise (PR)”. This paper compared between the baseline (original) dataset, SMOTE-applied dataset and shuffled, linear and stratified sampling in split train-test subset, based on a deep learning algorithm. This paper also proposed criteria of low values in standard deviation (distribution of true positive on each class label on accuracy) as a measurement for finding the best method in handling imbalanced class labels. As a result, multilayer SMOTE until all the classes distribution is the same, combined with stratified sampling in split training and testing subset, get the lower standard deviation (5.7873), high accuracy (80.06%) and less execution runtime (1 minute 41 seconds) compared to the original highly imbalanced dataset.
format	Article
id	my.uitm.ir-61448
institution	Universiti Teknologi Mara
language	en
publishDate	2019
publisher	Penerbit UiTM
record_format	eprints
spelling	my.uitm.ir-614482022-06-14T03:04:43Z https://ir.uitm.edu.my/id/eprint/61448/ Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat mjoc Khamsan, Muhammad Muhaimin Maskat, Ruhaila Data processing In practice, a balanced target class is rare. However, an imbalanced target class can be handled by resampling the original dataset, either by oversampling/upsampling or undersampling/downsampling. A popular upsampling technique is Synthetic Minority Over-sampling Technique (SMOTE). This technique increases the minority class by generating synthetic class labels and assigned the class based on the K-Nearest Neighbour (K-NN). SMOTE upsampling can only upsample at most one minority class at a time, which means for a multiclass dataset, it needs to undergo multilayer SMOTE to balance the class label distribution. This paper aims to find a suitable method in handling imbalanced class using dataset from Fantasy Premier League (FPL) virtual player to predict price changes. The cleaned dataset has a highly imbalanced class distribution, where the frequency of “Price Remain Unchanged (PRU)” is higher than “Price Fall (PF)” and “Price Rise (PR)”. This paper compared between the baseline (original) dataset, SMOTE-applied dataset and shuffled, linear and stratified sampling in split train-test subset, based on a deep learning algorithm. This paper also proposed criteria of low values in standard deviation (distribution of true positive on each class label on accuracy) as a measurement for finding the best method in handling imbalanced class labels. As a result, multilayer SMOTE until all the classes distribution is the same, combined with stratified sampling in split training and testing subset, get the lower standard deviation (5.7873), high accuracy (80.06%) and less execution runtime (1 minute 41 seconds) compared to the original highly imbalanced dataset. Penerbit UiTM 2019-12 Article PeerReviewed text en https://ir.uitm.edu.my/id/eprint/61448/1/61448.pdf Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat. (2019) Malaysian Journal of Computing (MJoC) <https://ir.uitm.edu.my/view/publication/Malaysian_Journal_of_Computing_=28MJoC=29/>, 4 (2): 4. pp. 304-316. ISSN 2600-8238 https://mjoc.uitm.edu.my/
spellingShingle	Data processing Khamsan, Muhammad Muhaimin Maskat, Ruhaila Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat
title	Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat
title_full	Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat
title_fullStr	Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat
title_full_unstemmed	Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat
title_short	Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat
title_sort	handling highly imbalanced output class label: a case study on fantasy premier league (fpl) virtual player price changes prediction using machine learning / muhammad muhaimin khamsan and ruhaila maskat
topic	Data processing
url	https://ir.uitm.edu.my/id/eprint/61448/1/61448.pdf https://ir.uitm.edu.my/id/eprint/61448/ https://mjoc.uitm.edu.my/
url_provider	http://ir.uitm.edu.my/

Handling highly imbalanced output class label: a case study on Fantasy Premier League (FPL) virtual player price changes prediction using machine learning / Muhammad Muhaimin Khamsan and Ruhaila Maskat

Similar Items