Exceeding manual labeling: VADER Lexicon as an accurate alternative to automatic sentiment classification
The number of internet users worldwide has increased dramatically, resulting in a surge of content uploaded over the Internet, particularly in text form. Global Internet users now exceed 5,16 billion, constituting a penetration rate of 64.4 percent of the world’s total population. While only a small...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Published: |
Zarka Private University
2025
|
| Subjects: | |
| Online Access: | https://umpir.ump.edu.my/id/eprint/43569/ https://doi.org/10.34028/iajit/22/2/2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The number of internet users worldwide has increased dramatically, resulting in a surge of content uploaded over the Internet, particularly in text form. Global Internet users now exceed 5,16 billion, constituting a penetration rate of 64.4 percent of the world’s total population. While only a small fraction of individuals actively expresses their opinions online, sentiment analysis aims to categorize textual information into favorable, negative, or neutral states of mind. When dealing with unlabeled datasets, the Valence Aware Dictionary and sEntiment Reasoner (VADER) Lexicon proves to be an effective tool for extracting feature sentiment. This facilitates the direct application of machine learning techniques such as Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN) to classify datasets. Fuzzy Matching (FM) serves as a dimensionality reduction technique. Experimental results utilizing three datasets from diverse sources reveal that the combination of FM and SVM yields the highest accuracy. Model validation through K-Fold cross-validation reveals notable accuracy rates across multiple datasets. For dataset A, the accuracy stands at 94.69% with manual labeling and improves slightly to 95.92 % with VADER labeling. Similarly, for dataset B, the accuracy shows a marginal increase from 96.94% manual labeling to 97.01% VADER labeling. Dataset C also displays an enhancement in accuracy, with manual labeling achieving 95.51% accuracy and VADER labeling demonstrating a higher accuracy of 96.73%. These results underscore the effectiveness of both manual and automated labeling techniques in enhancing model performance across diverse datasets. |
|---|
