Adoption of machine learning algorithm for analysing supporters and non supporters feedback on political posts / Ogunfolajin Maruff Tunde

Sentiment Analysis is a field that deals with the problem of identifying and extracting sentiment (or opinion) from data (particularly textual data). Studies have shown how user perception can have a strong influence on policies and decision-making processes in a place, society, and nation. This the...

Full description

Saved in:
Bibliographic Details
Main Author: Ogunfolajin Maruff , Tunde
Format: Thesis
Published: 2022
Subjects:
Online Access:http://studentsrepo.um.edu.my/15304/1/Ogunfolajin.pdf
http://studentsrepo.um.edu.my/15304/2/Ogunfolajin.pdf
http://studentsrepo.um.edu.my/15304/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sentiment Analysis is a field that deals with the problem of identifying and extracting sentiment (or opinion) from data (particularly textual data). Studies have shown how user perception can have a strong influence on policies and decision-making processes in a place, society, and nation. This thesis is based on the application of sentiment classification algorithm to tweet data with the goal of classifying messages based on the polarity of sentiment towards a particular topic (or subject matter). Political analysts often communicate with the public and exchange information through the social media platform. Their activities (otherwise termed cyber-trooping) could have either positive, negative, or neutral feedbacks (perceptions) in the public space. Thus, there is a need to automate the process of identifying and predicting (positive, negative, or neutral class) these cyber-trooping data. This work employed the use of machine learning approach. Four conventional classification algorithms: naïve bayes (NB), support vector machines (SVM), nearest neighbor (k-NN), and decision trees (J48) classifiers are implemented in identifying and categorizing tweet data of three political figures in Malaysia: Dato Seri Anwar, Dato Hadi Awang, and Lim Guang Eng, as either positive, negative, or neutral perceptions. The method was implemented using Java and the results of the simulation were evaluated using five standard performance metrics: accuracy, AUC, precision, recall, and f-Measure. The support vector machines (SVM) algorithm obtained the overall best results of 94.5% accuracy, 91.8% precision, 91.7% recall, and 91.1% f-Measure while the naïve bayes (NB) algorithm obtained the best AUC score of 0.944 with the tweet data of Dato Seri Anwar.