Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language
Social media platforms provide users with an efficient and effective way to interact with content without requiring lengthy or complex textual expressions. However, sarcasm in social media discourse has become a serious problem for researchers. Compared to English and several other main languages, t...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | http://eprints.utm.my/108427/1/SuzianeHaslinda2023_ConstructingandAnalysingtheMalaySarcDataset.pdf http://eprints.utm.my/108427/ http://dx.doi.org/10.11159/cist23.126 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.108427 |
---|---|
record_format |
eprints |
spelling |
my.utm.1084272024-11-01T02:52:08Z http://eprints.utm.my/108427/ Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language Suhaimi, Suziane Haslinda Abu Bakar, Nur Azaliah Mohd. Azmi, Nurulhuda Firdaus Q Science (General) Social media platforms provide users with an efficient and effective way to interact with content without requiring lengthy or complex textual expressions. However, sarcasm in social media discourse has become a serious problem for researchers. Compared to English and several other main languages, the research on sarcasm and the accessibility of reference materials in the Malay language are still significantly lagging. Therefore, this study aims to develop a new dataset of Malay sarcasm detection by detailing each process step, from data collection to filtering to annotation. The dataset consists of two types of data: Facebook comments and its emotion reaction buttons, which include 6,325 non-sarcastic texts and 1,380 sarcastic texts. In addition, the descriptive analysis of this dataset was also conducted to determine the usage patterns of the main features of Malay sarcasm. The analysis shows that emoji is one of the features that play an essential role in determining sarcastic comments. Besides, there are pattern-based features based on the identification of high-frequency terms in the text. The resulting dataset provides diverse examples of sarcasm that consider the linguistic and cultural nuances of the language, thus improving the accuracy and reliability of identifying social media. The findings will aid future research in developing automatic Malay sarcasm detection models using machine learning. 2023 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.utm.my/108427/1/SuzianeHaslinda2023_ConstructingandAnalysingtheMalaySarcDataset.pdf Suhaimi, Suziane Haslinda and Abu Bakar, Nur Azaliah and Mohd. Azmi, Nurulhuda Firdaus (2023) Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language. In: 9th World Congress on Electrical Engineering and Computer Systems and Sciences, EECSS 2023, 3 August 2023 - 5 August 2023, London, England. http://dx.doi.org/10.11159/cist23.126 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
Q Science (General) |
spellingShingle |
Q Science (General) Suhaimi, Suziane Haslinda Abu Bakar, Nur Azaliah Mohd. Azmi, Nurulhuda Firdaus Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language |
description |
Social media platforms provide users with an efficient and effective way to interact with content without requiring lengthy or complex textual expressions. However, sarcasm in social media discourse has become a serious problem for researchers. Compared to English and several other main languages, the research on sarcasm and the accessibility of reference materials in the Malay language are still significantly lagging. Therefore, this study aims to develop a new dataset of Malay sarcasm detection by detailing each process step, from data collection to filtering to annotation. The dataset consists of two types of data: Facebook comments and its emotion reaction buttons, which include 6,325 non-sarcastic texts and 1,380 sarcastic texts. In addition, the descriptive analysis of this dataset was also conducted to determine the usage patterns of the main features of Malay sarcasm. The analysis shows that emoji is one of the features that play an essential role in determining sarcastic comments. Besides, there are pattern-based features based on the identification of high-frequency terms in the text. The resulting dataset provides diverse examples of sarcasm that consider the linguistic and cultural nuances of the language, thus improving the accuracy and reliability of identifying social media. The findings will aid future research in developing automatic Malay sarcasm detection models using machine learning. |
format |
Conference or Workshop Item |
author |
Suhaimi, Suziane Haslinda Abu Bakar, Nur Azaliah Mohd. Azmi, Nurulhuda Firdaus |
author_facet |
Suhaimi, Suziane Haslinda Abu Bakar, Nur Azaliah Mohd. Azmi, Nurulhuda Firdaus |
author_sort |
Suhaimi, Suziane Haslinda |
title |
Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language |
title_short |
Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language |
title_full |
Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language |
title_fullStr |
Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language |
title_full_unstemmed |
Constructing and analysing the MalaySarc dataset: a resource for detecting and understanding sarcasm in Malay language |
title_sort |
constructing and analysing the malaysarc dataset: a resource for detecting and understanding sarcasm in malay language |
publishDate |
2023 |
url |
http://eprints.utm.my/108427/1/SuzianeHaslinda2023_ConstructingandAnalysingtheMalaySarcDataset.pdf http://eprints.utm.my/108427/ http://dx.doi.org/10.11159/cist23.126 |
_version_ |
1814932888576065536 |
score |
13.211869 |