Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification

The use of text data with high dimensionality affects classifier performance. Therefore, efficient feature selection (FS) is necessary to reduce dimensionality. In text classification challenges, FS algorithms based on a ranking approach are employed to improve the classification performance. To ran...

Full description

Saved in:
Bibliographic Details
Main Authors: ABDULKAREM ALSHALIF, SARAH, SENAN, NORHALINA, SAEED, FAISAL, WAD GHABAN, WAD GHABAN, IBRAHIM, NORAINI, MUHAMMAD AAMIR, MUHAMMAD AAMIR, WAREESA SHARIF, WAREESA SHARIF
Format: Article
Language:en
Published: Ieee Acces 2023
Subjects:
Online Access:http://eprints.uthm.edu.my/10746/1/J16418_6a3a4efa584c2a61e9a08cd61b82225d.pdf
http://eprints.uthm.edu.my/10746/
https://doi.org/10.1109/ACCESS.2023.3294563
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1833419332069097472
author ABDULKAREM ALSHALIF, SARAH
SENAN, NORHALINA
SAEED, FAISAL
WAD GHABAN, WAD GHABAN
IBRAHIM, NORAINI
MUHAMMAD AAMIR, MUHAMMAD AAMIR
WAREESA SHARIF, WAREESA SHARIF
author_facet ABDULKAREM ALSHALIF, SARAH
SENAN, NORHALINA
SAEED, FAISAL
WAD GHABAN, WAD GHABAN
IBRAHIM, NORAINI
MUHAMMAD AAMIR, MUHAMMAD AAMIR
WAREESA SHARIF, WAREESA SHARIF
author_sort ABDULKAREM ALSHALIF, SARAH
building UTHM Library
collection Institutional Repository
content_provider Universiti Tun Hussein Onn Malaysia
content_source UTHM Institutional Repository
continent Asia
country Malaysia
description The use of text data with high dimensionality affects classifier performance. Therefore, efficient feature selection (FS) is necessary to reduce dimensionality. In text classification challenges, FS algorithms based on a ranking approach are employed to improve the classification performance. To rank terms, most feature ranking algorithms, such as the Relative Discrimination Criterion (RDC) and Improved Relative Discrimination Criterion (IRDC), use document frequency (DF) and term frequency (TF). TF accepts the actual values of a term with frequently and rarely occurring terms used in existing feature ranking algorithms. However, these algorithms focus on the number of terms in a document rather than the number of terms in the category. In this research, an alternative method to RDC, called Alternative Relative Discrimination Criterion (ARDC) was proposed, which aims to improve the accuracy and effectiveness of RDC feature ranking. Specifically, ARDC is designed to identify terms commonly occurring in the positive class. The results obtained were compared to the existing RDC methods, which are RDC and IRDC, and standard benchmarking functions such as Information Gain (IG), Pearson Correlation Coefficient (PCC), and ReliefF. The experimental results reveal that using the suggested ARDC on the Reuters21578, 20newsgroup, and TDT2 datasets provides better performance in terms of precision, recall, f-measure, and accuracy when employing well-known classifiers such as multinomial naïve Bayes (MNB), Support Vector Machine (SVM), Multilayer perceptron (MLP), k-nearest neighbor (KNN), and decision tree (DT). Another experiment was performed to validate the proposed technique, which aims to showcase the novelty of the ARDC approach. The experiment utilized the 20newsgroup dataset and employed the Relevant-Based Feature Ranking (RBFR) technique. Naïve Bayes (NB), Random Forest (RF) and Logistic Regression (LR) classifiers were used in this experiment to demonstrate the effectiveness of the suggested ARDC.
format Article
id my.uthm.eprints-10746
institution Universiti Tun Hussein Onn Malaysia
language en
publishDate 2023
publisher Ieee Acces
record_format eprints
spelling my.uthm.eprints-107462024-01-17T01:51:22Z http://eprints.uthm.edu.my/10746/ Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification ABDULKAREM ALSHALIF, SARAH SENAN, NORHALINA SAEED, FAISAL WAD GHABAN, WAD GHABAN IBRAHIM, NORAINI MUHAMMAD AAMIR, MUHAMMAD AAMIR WAREESA SHARIF, WAREESA SHARIF T Technology (General) The use of text data with high dimensionality affects classifier performance. Therefore, efficient feature selection (FS) is necessary to reduce dimensionality. In text classification challenges, FS algorithms based on a ranking approach are employed to improve the classification performance. To rank terms, most feature ranking algorithms, such as the Relative Discrimination Criterion (RDC) and Improved Relative Discrimination Criterion (IRDC), use document frequency (DF) and term frequency (TF). TF accepts the actual values of a term with frequently and rarely occurring terms used in existing feature ranking algorithms. However, these algorithms focus on the number of terms in a document rather than the number of terms in the category. In this research, an alternative method to RDC, called Alternative Relative Discrimination Criterion (ARDC) was proposed, which aims to improve the accuracy and effectiveness of RDC feature ranking. Specifically, ARDC is designed to identify terms commonly occurring in the positive class. The results obtained were compared to the existing RDC methods, which are RDC and IRDC, and standard benchmarking functions such as Information Gain (IG), Pearson Correlation Coefficient (PCC), and ReliefF. The experimental results reveal that using the suggested ARDC on the Reuters21578, 20newsgroup, and TDT2 datasets provides better performance in terms of precision, recall, f-measure, and accuracy when employing well-known classifiers such as multinomial naïve Bayes (MNB), Support Vector Machine (SVM), Multilayer perceptron (MLP), k-nearest neighbor (KNN), and decision tree (DT). Another experiment was performed to validate the proposed technique, which aims to showcase the novelty of the ARDC approach. The experiment utilized the 20newsgroup dataset and employed the Relevant-Based Feature Ranking (RBFR) technique. Naïve Bayes (NB), Random Forest (RF) and Logistic Regression (LR) classifiers were used in this experiment to demonstrate the effectiveness of the suggested ARDC. Ieee Acces 2023 Article PeerReviewed text en http://eprints.uthm.edu.my/10746/1/J16418_6a3a4efa584c2a61e9a08cd61b82225d.pdf ABDULKAREM ALSHALIF, SARAH and SENAN, NORHALINA and SAEED, FAISAL and WAD GHABAN, WAD GHABAN and IBRAHIM, NORAINI and MUHAMMAD AAMIR, MUHAMMAD AAMIR and WAREESA SHARIF, WAREESA SHARIF (2023) Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification. Digital Object Identifier, 11. pp. 71739-71755. https://doi.org/10.1109/ACCESS.2023.3294563
spellingShingle T Technology (General)
ABDULKAREM ALSHALIF, SARAH
SENAN, NORHALINA
SAEED, FAISAL
WAD GHABAN, WAD GHABAN
IBRAHIM, NORAINI
MUHAMMAD AAMIR, MUHAMMAD AAMIR
WAREESA SHARIF, WAREESA SHARIF
Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification
title Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification
title_full Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification
title_fullStr Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification
title_full_unstemmed Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification
title_short Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification
title_sort alternative relative discrimination criterion feature ranking technique for text classification
topic T Technology (General)
url http://eprints.uthm.edu.my/10746/1/J16418_6a3a4efa584c2a61e9a08cd61b82225d.pdf
http://eprints.uthm.edu.my/10746/
https://doi.org/10.1109/ACCESS.2023.3294563
url_provider http://eprints.uthm.edu.my/