Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis
During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has b...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Book Section |
Published: |
Springer Science and Business Media Deutschland GmbH
2022
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/99745/ http://dx.doi.org/10.1007/978-3-030-98741-1_59 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.99745 |
---|---|
record_format |
eprints |
spelling |
my.utm.997452023-04-04T06:47:56Z http://eprints.utm.my/id/eprint/99745/ Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis Hair Zaki, Ummu Hani’ Ibrahim, Roliana Abd. Halim, Shahliza Kamsani, Izyan Izzati QA75 Electronic computers. Computer science During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier. Springer Science and Business Media Deutschland GmbH 2022 Book Section PeerReviewed Hair Zaki, Ummu Hani’ and Ibrahim, Roliana and Abd. Halim, Shahliza and Kamsani, Izyan Izzati (2022) Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis. In: Advances on Intelligent Informatics and Computing Health Informatics, Intelligent Systems, Data Science and Smart Computing. Lecture Notes on Data Engineering and Communications Technologies, 127 (NA). Springer Science and Business Media Deutschland GmbH, Cham, Switzerland, pp. 50-61. ISBN 978-3-030-98740-4 http://dx.doi.org/10.1007/978-3-030-98741-1_59 DOI : 10.1007/978-3-030-98741-1_59 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Hair Zaki, Ummu Hani’ Ibrahim, Roliana Abd. Halim, Shahliza Kamsani, Izyan Izzati Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis |
description |
During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier. |
format |
Book Section |
author |
Hair Zaki, Ummu Hani’ Ibrahim, Roliana Abd. Halim, Shahliza Kamsani, Izyan Izzati |
author_facet |
Hair Zaki, Ummu Hani’ Ibrahim, Roliana Abd. Halim, Shahliza Kamsani, Izyan Izzati |
author_sort |
Hair Zaki, Ummu Hani’ |
title |
Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis |
title_short |
Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis |
title_full |
Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis |
title_fullStr |
Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis |
title_full_unstemmed |
Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis |
title_sort |
text detergent: the systematic combination of text pre-processing techniques for social media sentiment analysis |
publisher |
Springer Science and Business Media Deutschland GmbH |
publishDate |
2022 |
url |
http://eprints.utm.my/id/eprint/99745/ http://dx.doi.org/10.1007/978-3-030-98741-1_59 |
_version_ |
1762837428040630272 |
score |
13.211869 |