Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling

Classification (of information); Learning algorithms; Students; Class imbalance; Data level; Over sampling; Performance prediction; SMOTE; Spread subsampling; Student performance; Student performance prediction; Under-sampling; Machine learning

Saved in:
Bibliographic Details
Main Authors: Khan I., Ahmad A.R., Jabeur N., Mahdi M.N.
Other Authors: 58061521900
Format: Conference Paper
Published: Springer Science and Business Media Deutschland GmbH 2023
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uniten.dspace-26465
record_format dspace
spelling my.uniten.dspace-264652023-05-29T17:10:49Z Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling Khan I. Ahmad A.R. Jabeur N. Mahdi M.N. 58061521900 35589598800 6505727698 56727803900 Classification (of information); Learning algorithms; Students; Class imbalance; Data level; Over sampling; Performance prediction; SMOTE; Spread subsampling; Student performance; Student performance prediction; Under-sampling; Machine learning Classification, a significant application of machine learning, labels each instance of the dataset into one of the predefined classes. Problems occur when the number of instances in the classes is not uniform. The exceptional lyuneven class distribution gives rise to class imbalancing issues which tend to demote the overall performance of the classifier. A set of data-level algorithms are available which are applied to adjust the class distribution. The class imbalancing emerges frequently in datasets from educational domains where the number of students with unsatisfactory performance general appears in low number comparing to the students with satisfactory outcomes. This paper applies a set of data-level sampling algorithms over a dataset taken from an educational domain. It underlines the consequences rising from classification with imbalanced dataset. This research confirms that a classification model achieving higher accuracy may not appear effective in correct identification of instances in minority class. Classification with an imbalance dataset may produce low recall, precision and F-Measure for classes with lower number of instances. The performance of classification model improves with application of data level algorithm. However, it highlights the supremacy of oversampling algorithm over undersampling algorithms. � 2021, Springer Nature Switzerland AG. Final 2023-05-29T09:10:49Z 2023-05-29T09:10:49Z 2021 Conference Paper 10.1007/978-3-030-90235-3_38 2-s2.0-85120523452 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85120523452&doi=10.1007%2f978-3-030-90235-3_38&partnerID=40&md5=9f40e54fe9a37bbd13aa3f30e8eadb05 https://irepository.uniten.edu.my/handle/123456789/26465 13051 LNCS 435 446 Springer Science and Business Media Deutschland GmbH Scopus
institution Universiti Tenaga Nasional
building UNITEN Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tenaga Nasional
content_source UNITEN Institutional Repository
url_provider http://dspace.uniten.edu.my/
description Classification (of information); Learning algorithms; Students; Class imbalance; Data level; Over sampling; Performance prediction; SMOTE; Spread subsampling; Student performance; Student performance prediction; Under-sampling; Machine learning
author2 58061521900
author_facet 58061521900
Khan I.
Ahmad A.R.
Jabeur N.
Mahdi M.N.
format Conference Paper
author Khan I.
Ahmad A.R.
Jabeur N.
Mahdi M.N.
spellingShingle Khan I.
Ahmad A.R.
Jabeur N.
Mahdi M.N.
Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling
author_sort Khan I.
title Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling
title_short Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling
title_full Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling
title_fullStr Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling
title_full_unstemmed Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling
title_sort minimizing classification errors in imbalanced dataset using means of sampling
publisher Springer Science and Business Media Deutschland GmbH
publishDate 2023
_version_ 1806424041170403328
score 13.211869