An enhanced feature selection technique for classification of group-based holy quran verses

This thesis is about proposing an enhanced feature selection technique for text classification applications. Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated wit...

全面介紹

Saved in:
書目詳細資料
主要作者: Oyekunle, Adeleke Abdullahi
格式: Thesis
語言:English
English
English
出版: 2018
主題:
在線閱讀:http://eprints.uthm.edu.my/7549/2/24p%20ADELEKE%20ABDULLAHI%20OYEKUNLE.pdf
http://eprints.uthm.edu.my/7549/1/ADELEKE%20ABDULLAHI%20OYEKUNLE%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/7549/3/ADELEKE%20ABDULLAHI%20OYEKUNLE%20WATERMARK.pdf
http://eprints.uthm.edu.my/7549/
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
id my.uthm.eprints.7549
record_format eprints
spelling my.uthm.eprints.75492022-08-21T01:44:33Z http://eprints.uthm.edu.my/7549/ An enhanced feature selection technique for classification of group-based holy quran verses Oyekunle, Adeleke Abdullahi HB Economic Theory HB135-147 Mathematical economics. Quantitative methods. Including econometrics, input-output analysis, game theory This thesis is about proposing an enhanced feature selection technique for text classification applications. Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated with wrapper-based FS techniques and low classification accuracy performance associated with filter-based FS techniques. Therefore, in this study, a hybrid feature selection technique is proposed. The proposed FS technique is a combination of JUter-based information gain (JG) and wrapper-based CFS algorithms. The purpose of combining these two FS algorithms is to achieve both high classification accuracy perfonnance (wrapper) at lower computational runtime (filter). The study also developed a group-based Quran dataset to improve on the understanding and analysis of the textual data (Quranic verses). The group-based dataset is a combination of Holy Quran translation and commentary (tafsir). The Quranic verses were selected from two chapters, Surah Al­Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. In the experiment, six feature selection algorithms were applied: In.formation Gain (JG), Chi-square (CH), Pearson Correlation Coefficient (PCC), RelieJF, Correlation-based (CFS), and the proposed JG-CFS algorithms. The textual data (Quranic verses) were preprocessed using StringtoWordVector with weighted Term Frequency-Inverse Document Frequency (IF-IDF). Meanwhile, the classification phase has involved four algorithms: Nai've Bayes (NB), k-Nearest Neighbor (k-NN), Support Vector Machine (LibSVM), and Decision Trees (148). The experiment results were evaluated based on two established perfonnance metrics in text classification: Accuracy and Area under Receiver Operating Characteristics (ROC) curve (A UC). The proposed hybrid feature selection technique has shown promising results in tenns of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (A UC) by achieving at a lower computational runtime (3.89secs) Accuracy of94.5% and AUC of0.944 with the group-based Quran dataset. 2018-01 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/7549/2/24p%20ADELEKE%20ABDULLAHI%20OYEKUNLE.pdf text en http://eprints.uthm.edu.my/7549/1/ADELEKE%20ABDULLAHI%20OYEKUNLE%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/7549/3/ADELEKE%20ABDULLAHI%20OYEKUNLE%20WATERMARK.pdf Oyekunle, Adeleke Abdullahi (2018) An enhanced feature selection technique for classification of group-based holy quran verses. Masters thesis, Universiti Tun Hussein Onn Malaysia.
institution Universiti Tun Hussein Onn Malaysia
building UTHM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tun Hussein Onn Malaysia
content_source UTHM Institutional Repository
url_provider http://eprints.uthm.edu.my/
language English
English
English
topic HB Economic Theory
HB135-147 Mathematical economics. Quantitative methods. Including econometrics, input-output analysis, game theory
spellingShingle HB Economic Theory
HB135-147 Mathematical economics. Quantitative methods. Including econometrics, input-output analysis, game theory
Oyekunle, Adeleke Abdullahi
An enhanced feature selection technique for classification of group-based holy quran verses
description This thesis is about proposing an enhanced feature selection technique for text classification applications. Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated with wrapper-based FS techniques and low classification accuracy performance associated with filter-based FS techniques. Therefore, in this study, a hybrid feature selection technique is proposed. The proposed FS technique is a combination of JUter-based information gain (JG) and wrapper-based CFS algorithms. The purpose of combining these two FS algorithms is to achieve both high classification accuracy perfonnance (wrapper) at lower computational runtime (filter). The study also developed a group-based Quran dataset to improve on the understanding and analysis of the textual data (Quranic verses). The group-based dataset is a combination of Holy Quran translation and commentary (tafsir). The Quranic verses were selected from two chapters, Surah Al­Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. In the experiment, six feature selection algorithms were applied: In.formation Gain (JG), Chi-square (CH), Pearson Correlation Coefficient (PCC), RelieJF, Correlation-based (CFS), and the proposed JG-CFS algorithms. The textual data (Quranic verses) were preprocessed using StringtoWordVector with weighted Term Frequency-Inverse Document Frequency (IF-IDF). Meanwhile, the classification phase has involved four algorithms: Nai've Bayes (NB), k-Nearest Neighbor (k-NN), Support Vector Machine (LibSVM), and Decision Trees (148). The experiment results were evaluated based on two established perfonnance metrics in text classification: Accuracy and Area under Receiver Operating Characteristics (ROC) curve (A UC). The proposed hybrid feature selection technique has shown promising results in tenns of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (A UC) by achieving at a lower computational runtime (3.89secs) Accuracy of94.5% and AUC of0.944 with the group-based Quran dataset.
format Thesis
author Oyekunle, Adeleke Abdullahi
author_facet Oyekunle, Adeleke Abdullahi
author_sort Oyekunle, Adeleke Abdullahi
title An enhanced feature selection technique for classification of group-based holy quran verses
title_short An enhanced feature selection technique for classification of group-based holy quran verses
title_full An enhanced feature selection technique for classification of group-based holy quran verses
title_fullStr An enhanced feature selection technique for classification of group-based holy quran verses
title_full_unstemmed An enhanced feature selection technique for classification of group-based holy quran verses
title_sort enhanced feature selection technique for classification of group-based holy quran verses
publishDate 2018
url http://eprints.uthm.edu.my/7549/2/24p%20ADELEKE%20ABDULLAHI%20OYEKUNLE.pdf
http://eprints.uthm.edu.my/7549/1/ADELEKE%20ABDULLAHI%20OYEKUNLE%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/7549/3/ADELEKE%20ABDULLAHI%20OYEKUNLE%20WATERMARK.pdf
http://eprints.uthm.edu.my/7549/
_version_ 1743109098417684480
score 13.250345