Mixed-language sentiment analysis on Malaysian social media using translated Vader and normalization heuristics
Most work in Sentiment Analysis has so far been in a single language context, primarily English. This work addresses the neglected issue of Sentiment Analysis in a mixed-language environment: Malaysian social media, which freely combines both Malay and English. The highly cited and effective English...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Proceedings |
| Language: | en |
| Published: |
Springer
2023
|
| Subjects: | |
| Online Access: | https://eprints.ums.edu.my/id/eprint/44801/1/FULLTEXT.pdf https://eprints.ums.edu.my/id/eprint/44801/ https://link.springer.com/chapter/10.1007/978-981-19-9379-4_15 https://doi.org/10.1007/978-981-19-9379-4_15 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Most work in Sentiment Analysis has so far been in a single language context, primarily English. This work addresses the neglected issue of Sentiment Analysis in a mixed-language environment: Malaysian social media, which freely combines both Malay and English. The highly cited and effective English Sentiment Analysis system VADER was converted to Malay for the first time and used in combination with English VADER to create a Multilanguage Sentiment Analysis system. Significant patterns in noisy Malaysian Social Media text were identified and heuristics for normalizing them were devised. Mixed-language VADER with normalization heuristics was able to achieve a 12% improvement in accuracy as compared to Malay VADER alone. In absolute terms, performance must be improved, but the results obtained here are encouraging for the future continuation of this approach. |
|---|
