Sentiment analysis and visualization of PADU from Malaysian X users using BERT

On the surface, the introduction of PADU might be met with varying degrees of acceptance with Malaysians but knowing the actual sentiment without any biases is hard. Sentiment analysis of a certain topic, which in this study is PADU is a complex field that involves scraping datasets and classifying...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd Hosni, Ahmad Ishraf Imran, Jasmis, Jamaluddin
Format: Article
Language:en
Published: College of Computing, Informatics, and Mathematics 2025
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/127573/1/127573.pdf
https://ir.uitm.edu.my/id/eprint/127573/
https://fskmjebat.uitm.edu.my/pcmj/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:On the surface, the introduction of PADU might be met with varying degrees of acceptance with Malaysians but knowing the actual sentiment without any biases is hard. Sentiment analysis of a certain topic, which in this study is PADU is a complex field that involves scraping datasets and classifying them with great accuracy where if one were to do it manually, would inevitably introduce some sort of bias to the results. The project provides a solution to the matter by developing a sentiment analysis model and appropriately visualising the data and results. The dataset used is scraped from X using Tweet Harvest which consists of 88 datapoints which were further augmented to 440 datapoints. The model is developed using bidirectional encoder representations from transformers that are trained with the dataset gathered. The model follows the software development methodology using waterfall and is released on a web platform. The result of the model that was trained with the combination of collected and augmented datasets showed 87% accuracy, 87% Precision, 87% Recall and F1-score of 87% compared with the model that was trained using only the collected dataset. In the future, further improvement to this project will be seen in the form of bigger language support for the model and the collection of data from a wide variety of social media