Data augmentation approach for language identification in imbalanced bilingual code-mixed social media datasets

Addressing the problem of language identification in code-mixed datasets poses notable challenges due to data scarcity and high confusability in bilingual contexts. These challenges are further amplified by the associated imbalance and noise characteristic of social media data, complicating efforts...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd Suhairi, Md Suhaimin, Mohd Hanafi, Ahmad Hijazi, Moung, Ervin Gubin, Mohd Azwan, Mohamad Hamza
Format: Conference or Workshop Item
Language:English
English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/40378/1/Data%20augmentation%20approach%20for%20language%20identification.pdf
http://umpir.ump.edu.my/id/eprint/40378/2/Data%20augmentation%20approach%20for%20language%20identification%20in%20imbalanced%20bilingual%20code-mixed%20social%20media%20datasets_ABS.pdf
http://umpir.ump.edu.my/id/eprint/40378/
https://doi.org/10.1109/IICAIET59451.2023.10292108
Tags: Add Tag
No Tags, Be the first to tag this record!