Mixed-language sentiment analysis on Malaysian social media using translated Vader and normalization heuristics

Most work in Sentiment Analysis has so far been in a single language context, primarily English. This work addresses the neglected issue of Sentiment Analysis in a mixed-language environment: Malaysian social media, which freely combines both Malay and English. The highly cited and effective English...

Full description

Saved in:
Bibliographic Details
Main Authors: James Mountstephens, Mathieson Tan Zui Quen
Format: Proceedings
Language:en
Published: Springer 2023
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/44801/1/FULLTEXT.pdf
https://eprints.ums.edu.my/id/eprint/44801/
https://link.springer.com/chapter/10.1007/978-981-19-9379-4_15
https://doi.org/10.1007/978-981-19-9379-4_15
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Most work in Sentiment Analysis has so far been in a single language context, primarily English. This work addresses the neglected issue of Sentiment Analysis in a mixed-language environment: Malaysian social media, which freely combines both Malay and English. The highly cited and effective English Sentiment Analysis system VADER was converted to Malay for the first time and used in combination with English VADER to create a Multilanguage Sentiment Analysis system. Significant patterns in noisy Malaysian Social Media text were identified and heuristics for normalizing them were devised. Mixed-language VADER with normalization heuristics was able to achieve a 12% improvement in accuracy as compared to Malay VADER alone. In absolute terms, performance must be improved, but the results obtained here are encouraging for the future continuation of this approach.