Annotated dataset for sentiment analysis and sarcasm detection: bilingual code-mixed english-malay social media data in the public security domain

Sentiment analysis in the public security domain involves analysing public sentiment, emotions, opinions, and attitudes toward events, phenomena, and crises. However, the complexity of sarcasm, which tends to alter the intended meaning, combined with the use of bilingual code-mixed content, hampers...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd Suhairi Md Suhaimin, Mohd Hanafi Ahmad Hijazi, Ervin Gubin Moung
Format: Article
Language:en
Published: Elsevier 2024
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/43511/1/FULL%20TEXT.pdf
https://eprints.ums.edu.my/id/eprint/43511/
https://doi.org/10.1016/j.dib.2024.110663
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sentiment analysis in the public security domain involves analysing public sentiment, emotions, opinions, and attitudes toward events, phenomena, and crises. However, the complexity of sarcasm, which tends to alter the intended meaning, combined with the use of bilingual code-mixed content, hampers sentiment analysis systems. Currently, limited datasets are available that focus on these issues. This paper introduces a comprehensive dataset constructed through a systematic data acquisition and annotation process. The acquisition process includes collecting data from social media platforms, starting with keyword searching, querying, and scraping, resulting in an acquired dataset. The subsequent annotation process involves refining and labelling, starting with data merging, selection, and annotation, ending in an annotated dataset. Three expert annotators from different fields were appointed for the labelling tasks, which produced