Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie

TF-IDF is a technique used to extract features in the field of text classification. The TF-IDF approach extracts feature by considering the frequencies of terms and their inverse document frequencies. The performance of various feature extraction methods varies, and it is necessary to determine the...

Full description

Saved in:
Bibliographic Details
Main Authors: Che Mohd Safawi, Najwa Umaira, Shafie, Nur Amalina
Format: Article
Language:English
Published: UiTM Cawangan Perlis 2024
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/102603/1/102603.pdf
https://ir.uitm.edu.my/id/eprint/102603/
https://jcrinn.com/index.php/jcrinn
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uitm.ir.102603
record_format eprints
spelling my.uitm.ir.1026032024-10-18T08:52:05Z https://ir.uitm.edu.my/id/eprint/102603/ Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie jcrinn Che Mohd Safawi, Najwa Umaira Shafie, Nur Amalina Mathematical statistics. Probabilities TF-IDF is a technique used to extract features in the field of text classification. The TF-IDF approach extracts feature by considering the frequencies of terms and their inverse document frequencies. The performance of various feature extraction methods varies, and it is necessary to determine the most appropriate approach for accurately classifying Shopee's application user reviews to enhance the user experience in Malaysia. This study aims to assess the efficacy of TF-IDF in text classification tasks, analyze their advantages and disadvantages, and identify the specific conditions in TF-IDF. The study employs a dataset of Shopee customer reviews acquired from the Google Play Store as the main data source. The methodology entails pre-processing the text data by applying a text normalization procedure that includes several processes, such as eliminating stop words, Unicode characters, and lemmatizing. The Logistic Regression, Support Vector Machine, and Decision Tree classifiers are trained and graded using both feature extraction approaches. The research notes that the efficacy of feature extraction approaches may differ based on the specific data set and task being considered. Subsequent studies might examine alternative methods of extracting features and assess their efficacy across various domains and datasets. UiTM Cawangan Perlis 2024-09 Article PeerReviewed text en https://ir.uitm.edu.my/id/eprint/102603/1/102603.pdf Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie. (2024) Journal of Computing Research and Innovation (JCRINN) <https://ir.uitm.edu.my/view/publication/Journal_of_Computing_Research_and_Innovation_=28JCRINN=29/>, 9 (2): 2. pp. 13-22. ISSN 2600-8793 https://jcrinn.com/index.php/jcrinn
institution Universiti Teknologi Mara
building Tun Abdul Razak Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Mara
content_source UiTM Institutional Repository
url_provider http://ir.uitm.edu.my/
language English
topic Mathematical statistics. Probabilities
spellingShingle Mathematical statistics. Probabilities
Che Mohd Safawi, Najwa Umaira
Shafie, Nur Amalina
Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie
description TF-IDF is a technique used to extract features in the field of text classification. The TF-IDF approach extracts feature by considering the frequencies of terms and their inverse document frequencies. The performance of various feature extraction methods varies, and it is necessary to determine the most appropriate approach for accurately classifying Shopee's application user reviews to enhance the user experience in Malaysia. This study aims to assess the efficacy of TF-IDF in text classification tasks, analyze their advantages and disadvantages, and identify the specific conditions in TF-IDF. The study employs a dataset of Shopee customer reviews acquired from the Google Play Store as the main data source. The methodology entails pre-processing the text data by applying a text normalization procedure that includes several processes, such as eliminating stop words, Unicode characters, and lemmatizing. The Logistic Regression, Support Vector Machine, and Decision Tree classifiers are trained and graded using both feature extraction approaches. The research notes that the efficacy of feature extraction approaches may differ based on the specific data set and task being considered. Subsequent studies might examine alternative methods of extracting features and assess their efficacy across various domains and datasets.
format Article
author Che Mohd Safawi, Najwa Umaira
Shafie, Nur Amalina
author_facet Che Mohd Safawi, Najwa Umaira
Shafie, Nur Amalina
author_sort Che Mohd Safawi, Najwa Umaira
title Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie
title_short Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie
title_full Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie
title_fullStr Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie
title_full_unstemmed Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie
title_sort performance of tf-idf for text classification reviews on google play store: shopee / najwa umaira che mohd safawi and nur amalina shafie
publisher UiTM Cawangan Perlis
publishDate 2024
url https://ir.uitm.edu.my/id/eprint/102603/1/102603.pdf
https://ir.uitm.edu.my/id/eprint/102603/
https://jcrinn.com/index.php/jcrinn
_version_ 1814058463647498240
score 13.211869