A performance comparison of feature extraction methods for sentiment analysis

Sentiment analysis is the task of classifying documents according to their sentiment polarity. Before classification of sentiment documents, plain text documents need to be transformed into workable data for the system. This step is known as feature extraction. Feature extraction produces text repre...

Full description

Saved in:
Bibliographic Details
Main Authors: Lai, Po Hung, Rayner Alfred
Format: Book Chapter
Language:English
Published: Springer Verlag 2017
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/20036/1/A%20performance%20comparison%20of%20feature%20extraction%20methods%20for%20sentiment%20analysis.pdf
https://eprints.ums.edu.my/id/eprint/20036/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ums.eprints.20036
record_format eprints
spelling my.ums.eprints.200362018-05-08T05:10:56Z https://eprints.ums.edu.my/id/eprint/20036/ A performance comparison of feature extraction methods for sentiment analysis Lai, Po Hung Rayner Alfred TA Engineering (General). Civil engineering (General) Sentiment analysis is the task of classifying documents according to their sentiment polarity. Before classification of sentiment documents, plain text documents need to be transformed into workable data for the system. This step is known as feature extraction. Feature extraction produces text representations that are enriched with information in order to have better classification results. The experiment in this work aims to investigate the effects of applying different sets of features extracted and to discuss the behavior of the features in sentiment analysis. These features extraction methods include unigrams, bigrams, trigrams, Part-Of-Speech (POS) and Sentiwordnet methods. The unigrams, part-of-speech and Sentiwordnet features are word based features, whereas bigrams and trigrams are phrase-based features. From the results of the experiment obtained, phrase based features are more effective for sentiment analysis as the accuracies produced are much higher than word based features. This might be due to the fact that word based features disregards the sentence structure and sequence of original text and thus distorting the original meaning of the text. Bigrams and trigrams features retain some sequence of the sentences thus contributing to better representations of the text. Springer Verlag 2017 Book Chapter NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/20036/1/A%20performance%20comparison%20of%20feature%20extraction%20methods%20for%20sentiment%20analysis.pdf Lai, Po Hung and Rayner Alfred (2017) A performance comparison of feature extraction methods for sentiment analysis. Studies in Computational Intelligence, 710. pp. 379-390. ISSN 1860-949X
institution Universiti Malaysia Sabah
building UMS Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sabah
content_source UMS Institutional Repository
url_provider http://eprints.ums.edu.my/
language English
topic TA Engineering (General). Civil engineering (General)
spellingShingle TA Engineering (General). Civil engineering (General)
Lai, Po Hung
Rayner Alfred
A performance comparison of feature extraction methods for sentiment analysis
description Sentiment analysis is the task of classifying documents according to their sentiment polarity. Before classification of sentiment documents, plain text documents need to be transformed into workable data for the system. This step is known as feature extraction. Feature extraction produces text representations that are enriched with information in order to have better classification results. The experiment in this work aims to investigate the effects of applying different sets of features extracted and to discuss the behavior of the features in sentiment analysis. These features extraction methods include unigrams, bigrams, trigrams, Part-Of-Speech (POS) and Sentiwordnet methods. The unigrams, part-of-speech and Sentiwordnet features are word based features, whereas bigrams and trigrams are phrase-based features. From the results of the experiment obtained, phrase based features are more effective for sentiment analysis as the accuracies produced are much higher than word based features. This might be due to the fact that word based features disregards the sentence structure and sequence of original text and thus distorting the original meaning of the text. Bigrams and trigrams features retain some sequence of the sentences thus contributing to better representations of the text.
format Book Chapter
author Lai, Po Hung
Rayner Alfred
author_facet Lai, Po Hung
Rayner Alfred
author_sort Lai, Po Hung
title A performance comparison of feature extraction methods for sentiment analysis
title_short A performance comparison of feature extraction methods for sentiment analysis
title_full A performance comparison of feature extraction methods for sentiment analysis
title_fullStr A performance comparison of feature extraction methods for sentiment analysis
title_full_unstemmed A performance comparison of feature extraction methods for sentiment analysis
title_sort performance comparison of feature extraction methods for sentiment analysis
publisher Springer Verlag
publishDate 2017
url https://eprints.ums.edu.my/id/eprint/20036/1/A%20performance%20comparison%20of%20feature%20extraction%20methods%20for%20sentiment%20analysis.pdf
https://eprints.ums.edu.my/id/eprint/20036/
_version_ 1760229661522001920
score 13.211869