Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis

During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has b...

Full description

Saved in:
Bibliographic Details
Main Authors: Hair Zaki, Ummu Hani’, Ibrahim, Roliana, Abd. Halim, Shahliza, Kamsani, Izyan Izzati
Format: Book Section
Published: Springer Science and Business Media Deutschland GmbH 2022
Subjects:
Online Access:http://eprints.utm.my/id/eprint/99745/
http://dx.doi.org/10.1007/978-3-030-98741-1_59
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.99745
record_format eprints
spelling my.utm.997452023-04-04T06:47:56Z http://eprints.utm.my/id/eprint/99745/ Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis Hair Zaki, Ummu Hani’ Ibrahim, Roliana Abd. Halim, Shahliza Kamsani, Izyan Izzati QA75 Electronic computers. Computer science During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier. Springer Science and Business Media Deutschland GmbH 2022 Book Section PeerReviewed Hair Zaki, Ummu Hani’ and Ibrahim, Roliana and Abd. Halim, Shahliza and Kamsani, Izyan Izzati (2022) Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis. In: Advances on Intelligent Informatics and Computing Health Informatics, Intelligent Systems, Data Science and Smart Computing. Lecture Notes on Data Engineering and Communications Technologies, 127 (NA). Springer Science and Business Media Deutschland GmbH, Cham, Switzerland, pp. 50-61. ISBN 978-3-030-98740-4 http://dx.doi.org/10.1007/978-3-030-98741-1_59 DOI : 10.1007/978-3-030-98741-1_59
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Hair Zaki, Ummu Hani’
Ibrahim, Roliana
Abd. Halim, Shahliza
Kamsani, Izyan Izzati
Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis
description During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier.
format Book Section
author Hair Zaki, Ummu Hani’
Ibrahim, Roliana
Abd. Halim, Shahliza
Kamsani, Izyan Izzati
author_facet Hair Zaki, Ummu Hani’
Ibrahim, Roliana
Abd. Halim, Shahliza
Kamsani, Izyan Izzati
author_sort Hair Zaki, Ummu Hani’
title Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis
title_short Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis
title_full Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis
title_fullStr Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis
title_full_unstemmed Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis
title_sort text detergent: the systematic combination of text pre-processing techniques for social media sentiment analysis
publisher Springer Science and Business Media Deutschland GmbH
publishDate 2022
url http://eprints.utm.my/id/eprint/99745/
http://dx.doi.org/10.1007/978-3-030-98741-1_59
_version_ 1762837428040630272
score 13.211869