Subjectivity analysis of an enhanced feature set for code-switching text
The phenomenon of code-switching has posed a new challenge to the linguistic computing area. Conventionally, the computer will process monolingual text or multilingual text. However, code-switching is different from this kind of text. Two or more languages are used to construct a piece of code-swit...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | en |
| Published: |
Science and Information Organization
2024
|
| Online Access: | http://eprints.utem.edu.my/id/eprint/29200/2/00979231020241027201215.pdf http://eprints.utem.edu.my/id/eprint/29200/ https://thesai.org/Downloads/Volume15No9/Paper_45-Subjectivity_Analysis_of_an_Enhanced_Feature_Set.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The phenomenon of code-switching has posed a new
challenge to the linguistic computing area. Conventionally, the computer will process monolingual text or multilingual text. However, code-switching is different from this kind of text. Two or more languages are used to construct a piece of code-switching text, particularly a code-switching sentence. It is challenging for the computer to process a piece of code-switching text with languages that exist simultaneously. The challenge is more intense for the computer in subjectivity analysis, where the computer should distinguish subjective from objective code-switching text. This paper proposed three feature sets for subjectivity analysis on Malay-English code-switching text: Embedded Code-Switching Feature Sets, Unified Code-Switching Feature Sets, and Stylistic
Feature Sets. These feature sets were enhanced from the
monolingual feature set of subjectivity analysis. Experiments were conducted using the data harvested from Malay-English blogs. These data were labelled as either subjective or objective. Two machine learning classifiers – the Support Vector Machine (SVM) and Naive-Bayes, were used to evaluate the classification performance of the proposed feature sets. The experiments were carried out on individual feature sets and the combination of them. The results show the classification performance from combining the unified and stylistic feature sets surpassed other proposed feature sets at 59% accuracy. Therefore, it is concluded that the combination of unified and stylistic feature sets is necessary for the subjectivity analysis of Malay-English code-switching text. |
|---|
