Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam
Sentiment polarity calculation is a method to gage the strength of a sentiment extracted from a text. Many tools have been developed with their respective scoring mechanism in order to produce an effective sentiment score. Semantic Orientation Calculator (SO-CAL) is one of the lexicon-based tool tha...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2017
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/10808/2/Rayvendran.pdf http://studentsrepo.um.edu.my/10808/1/Rayvendran.pdf http://studentsrepo.um.edu.my/10808/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.stud.10808 |
---|---|
record_format |
eprints |
spelling |
my.um.stud.108082020-01-18T02:01:26Z Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam Rayvendran , Visvalingam HD Industries. Land use. Labor QA75 Electronic computers. Computer science Sentiment polarity calculation is a method to gage the strength of a sentiment extracted from a text. Many tools have been developed with their respective scoring mechanism in order to produce an effective sentiment score. Semantic Orientation Calculator (SO-CAL) is one of the lexicon-based tool that is incorporated with important features (such as intensifiers, negation and etc.) to calculate the sentiment polarity of a text. However, this tool has its limitation in processing misspelled word especially in repeated letters or characters that may lead to sentiment inaccuracy. The accuracy of SO-CAL is when processing social media text that mostly contains misspelled word is low. Thus, an enhanced scoring mechanism (LexiPro-SM) was developed to improve the sentiment scoring considering misspelled word especially on words that contain repeated letters. The LexiPro-SM was tested on the posts that were collected from the Facebook official pages of two major airline industries in Malaysia, which will be referred to Airline A and Airline B respectively. Three important phases were involved the development of LexiPro-SM which are, data collection, data cleaning and data analysis. Data collection was performed with the aid of Facebook Graph API to collect three months’ posts from the both airlines. Data cleaning was performed by removing noise leaving only text that contains alphabets and exclamation mark. Improvement was made on the scoring mechanism and incorporated in LexiPro-SM with the features that can process misspelled word and also other improved features such as negation and exclamation mark. Then clean data of the airline was analyzed with LexiPro-SM and SO-CAL. A web-based portal was developed to visualize the LexiPro-SM’s result of the two airlines, where each airline has own page with overall score chart, polarity group chart and sub-services chart. Sub-services chart is a new idea implemented in this research to categorize the overall services into sub-services such as customer service, price, preflight and facility. This would be helpful for the airline management to improve their service by narrowing down their attention into a particular service. The airline pages are also linked in order to show the comparison results between Airline A and Airline B. Based on these results, a case study was conducted between the two airlines where the observation shows that Airline A achieved a high positive score than Airline B. Moreover, to assess the effectiveness of LexiPro-SM , the both results of LexiPro-SM and SO-CAL was compared by performed evaluation measures using evaluation metrics (such as accuracy, recall, precision and F1-score) with the reference of human expert results. From the evaluation it shows LexiPro-SM achieved higher accuracy (90.7%) than SO-CAL (58.33%). Overall, in LexiPro-SM the improvement made has increased the accuracy of sentiment detection and produced a better result than SO-CAL. This concludes processing misspelled word is an important process in social media sentiment analysis. This is further proved with the reference to the case study, where a conclusion was formed as Airline A providing a better service than Airline B. 2017-11 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10808/2/Rayvendran.pdf application/pdf http://studentsrepo.um.edu.my/10808/1/Rayvendran.pdf Rayvendran , Visvalingam (2017) Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam. Masters thesis, University of Malaya. http://studentsrepo.um.edu.my/10808/ |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Student Repository |
url_provider |
http://studentsrepo.um.edu.my/ |
topic |
HD Industries. Land use. Labor QA75 Electronic computers. Computer science |
spellingShingle |
HD Industries. Land use. Labor QA75 Electronic computers. Computer science Rayvendran , Visvalingam Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam |
description |
Sentiment polarity calculation is a method to gage the strength of a sentiment extracted from a text. Many tools have been developed with their respective scoring mechanism in order to produce an effective sentiment score. Semantic Orientation Calculator (SO-CAL) is one of the lexicon-based tool that is incorporated with important features (such as intensifiers, negation and etc.) to calculate the sentiment polarity of a text. However, this tool has its limitation in processing misspelled word especially in repeated letters or characters that may lead to sentiment inaccuracy. The accuracy of SO-CAL is when processing social media text that mostly contains misspelled word is low. Thus, an enhanced scoring mechanism (LexiPro-SM) was developed to improve the sentiment scoring considering misspelled word especially on words that contain repeated letters. The LexiPro-SM was tested on the posts that were collected from the Facebook official pages of two major airline industries in Malaysia, which will be referred to Airline A and Airline B respectively. Three important phases were involved the development of LexiPro-SM which are, data collection, data cleaning and data analysis. Data collection was performed with the aid of Facebook Graph API to collect three months’ posts from the both airlines. Data cleaning was performed by removing noise leaving only text that contains alphabets and exclamation mark. Improvement was made on the scoring mechanism and incorporated in LexiPro-SM with the features that can process misspelled word and also other improved features such as negation and exclamation mark. Then clean data of the airline was analyzed with LexiPro-SM and SO-CAL. A web-based portal was developed to visualize the LexiPro-SM’s result of the two airlines, where each airline has own page with overall score chart, polarity group chart and sub-services chart. Sub-services chart is a new idea implemented in this research to categorize the overall services into sub-services such as customer service, price, preflight and facility. This would be helpful for the airline management to improve their service by narrowing down their attention into a particular service. The airline pages are also linked in order to show the comparison results between Airline A and Airline B. Based on these results, a case study was conducted between the two airlines where the observation shows that Airline A achieved a high positive score than Airline B. Moreover, to assess the effectiveness of LexiPro-SM , the both results of LexiPro-SM and SO-CAL was compared by performed evaluation measures using evaluation metrics (such as accuracy, recall, precision and F1-score) with the reference of human expert results. From the evaluation it shows LexiPro-SM achieved higher accuracy (90.7%) than SO-CAL (58.33%). Overall, in LexiPro-SM the improvement made has increased the accuracy of sentiment detection and produced a better result than SO-CAL. This concludes processing misspelled word is an important process in social media sentiment analysis. This is further proved with the reference to the case study, where a conclusion was formed as Airline A providing a better service than Airline B. |
format |
Thesis |
author |
Rayvendran , Visvalingam |
author_facet |
Rayvendran , Visvalingam |
author_sort |
Rayvendran , Visvalingam |
title |
Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam |
title_short |
Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam |
title_full |
Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam |
title_fullStr |
Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam |
title_full_unstemmed |
Enhancement of sentiment analysis scoring mechanism – A case study on Malaysian Airline industry / Rayvendran Visvalingam |
title_sort |
enhancement of sentiment analysis scoring mechanism – a case study on malaysian airline industry / rayvendran visvalingam |
publishDate |
2017 |
url |
http://studentsrepo.um.edu.my/10808/2/Rayvendran.pdf http://studentsrepo.um.edu.my/10808/1/Rayvendran.pdf http://studentsrepo.um.edu.my/10808/ |
_version_ |
1738506413408256000 |
score |
13.211869 |