Subjectivity analysis of an enhanced feature set for code-switching text

The phenomenon of code-switching has posed a new challenge to the linguistic computing area. Conventionally, the computer will process monolingual text or multilingual text. However, code-switching is different from this kind of text. Two or more languages are used to construct a piece of code-swit...

Full description

Saved in:
Bibliographic Details
Main Authors: Basiron, Halizah, Kasmuri, Emaliana
Format: Article
Language:en
Published: Science and Information Organization 2024
Online Access:http://eprints.utem.edu.my/id/eprint/29200/2/00979231020241027201215.pdf
http://eprints.utem.edu.my/id/eprint/29200/
https://thesai.org/Downloads/Volume15No9/Paper_45-Subjectivity_Analysis_of_an_Enhanced_Feature_Set.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The phenomenon of code-switching has posed a new challenge to the linguistic computing area. Conventionally, the computer will process monolingual text or multilingual text. However, code-switching is different from this kind of text. Two or more languages are used to construct a piece of code-switching text, particularly a code-switching sentence. It is challenging for the computer to process a piece of code-switching text with languages that exist simultaneously. The challenge is more intense for the computer in subjectivity analysis, where the computer should distinguish subjective from objective code-switching text. This paper proposed three feature sets for subjectivity analysis on Malay-English code-switching text: Embedded Code-Switching Feature Sets, Unified Code-Switching Feature Sets, and Stylistic Feature Sets. These feature sets were enhanced from the monolingual feature set of subjectivity analysis. Experiments were conducted using the data harvested from Malay-English blogs. These data were labelled as either subjective or objective. Two machine learning classifiers – the Support Vector Machine (SVM) and Naive-Bayes, were used to evaluate the classification performance of the proposed feature sets. The experiments were carried out on individual feature sets and the combination of them. The results show the classification performance from combining the unified and stylistic feature sets surpassed other proposed feature sets at 59% accuracy. Therefore, it is concluded that the combination of unified and stylistic feature sets is necessary for the subjectivity analysis of Malay-English code-switching text.