A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity

There has been significant growth in social media networks in the last few years. Posting opinions and messages on social networking websites has become a popular activity on the Internet. The data sources are necessary for business intelligence and market analytics, as human opinions form a major i...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammed Salem , Abdullah Kaity
Format: Thesis
Published: 2020
Subjects:
Online Access:http://studentsrepo.um.edu.my/14485/1/Mohammed_Salem.pdf
http://studentsrepo.um.edu.my/14485/2/Mohammed_Salem.pdf
http://studentsrepo.um.edu.my/14485/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.14485
record_format eprints
spelling my.um.stud.144852023-06-21T22:54:00Z A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity Mohammed Salem , Abdullah Kaity QA75 Electronic computers. Computer science QA76 Computer software There has been significant growth in social media networks in the last few years. Posting opinions and messages on social networking websites has become a popular activity on the Internet. The data sources are necessary for business intelligence and market analytics, as human opinions form a major indicator of human desires and behaviour. This has resulted in the development of a new study field called sentiment analysis. This includes the analysis, evaluation and interpretation of the opinions with the help of text mining and Natural Language Processing (NLP) processes, for identifying the text polarity, as positive, neutral or negative. It is important to build sentiment analysis resources before developing the sentiment analysis models. The sentiment lexicons are seen to be a major resource which includes a list of phrases and opinion words along with their sentiment orientation. Literature review revealed that though many texts are available which are written in different languages, a majority of the sentiment analysis studies have focused on those written in English. Hence, the other non-English languages noted a shortage of lexicons and resources. Also, the techniques used for building the sentiment lexicons in non-English languages display many disadvantages like their inability to handle a particular domain, informal use of language expression and vocabulary used in the social media feeds. Furthermore, a few of the non-English sentiment lexicons also have to face translation issues and are plagued by the cultural difference when they are translated from different languages. To overcome the issues which are noted while building the non-English lexicons, a language-independent integrated framework has been proposed in this work which semi-automatically builds the non-English sentiment lexicons based on the available English lexicons with an unannotated corpus from the target language. This framework includes three layers, i.e., corpus-based, lexicon-based, and human-based. The first two layers can automatically recognise and then extract the novel polarity words from the huge unannotated corpus, with the help of the initial seed lexicons. The major advantage of this framework is that it needs only an initial seed lexicon and an unannotated corpus for initiating the extraction activity. This framework is seen to be semi-supervised owing to the usage of the seed lexicons. Experiments on three languages have been carried out and the proposed framework output has shown a better performance than the existing lexicons. The F-measure values for the Arabic, French and Malay lexicons were seen to be 0.778, 0.838 and 0.686, respectively. 2020-04 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14485/1/Mohammed_Salem.pdf application/pdf http://studentsrepo.um.edu.my/14485/2/Mohammed_Salem.pdf Mohammed Salem , Abdullah Kaity (2020) A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14485/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
QA76 Computer software
spellingShingle QA75 Electronic computers. Computer science
QA76 Computer software
Mohammed Salem , Abdullah Kaity
A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity
description There has been significant growth in social media networks in the last few years. Posting opinions and messages on social networking websites has become a popular activity on the Internet. The data sources are necessary for business intelligence and market analytics, as human opinions form a major indicator of human desires and behaviour. This has resulted in the development of a new study field called sentiment analysis. This includes the analysis, evaluation and interpretation of the opinions with the help of text mining and Natural Language Processing (NLP) processes, for identifying the text polarity, as positive, neutral or negative. It is important to build sentiment analysis resources before developing the sentiment analysis models. The sentiment lexicons are seen to be a major resource which includes a list of phrases and opinion words along with their sentiment orientation. Literature review revealed that though many texts are available which are written in different languages, a majority of the sentiment analysis studies have focused on those written in English. Hence, the other non-English languages noted a shortage of lexicons and resources. Also, the techniques used for building the sentiment lexicons in non-English languages display many disadvantages like their inability to handle a particular domain, informal use of language expression and vocabulary used in the social media feeds. Furthermore, a few of the non-English sentiment lexicons also have to face translation issues and are plagued by the cultural difference when they are translated from different languages. To overcome the issues which are noted while building the non-English lexicons, a language-independent integrated framework has been proposed in this work which semi-automatically builds the non-English sentiment lexicons based on the available English lexicons with an unannotated corpus from the target language. This framework includes three layers, i.e., corpus-based, lexicon-based, and human-based. The first two layers can automatically recognise and then extract the novel polarity words from the huge unannotated corpus, with the help of the initial seed lexicons. The major advantage of this framework is that it needs only an initial seed lexicon and an unannotated corpus for initiating the extraction activity. This framework is seen to be semi-supervised owing to the usage of the seed lexicons. Experiments on three languages have been carried out and the proposed framework output has shown a better performance than the existing lexicons. The F-measure values for the Arabic, French and Malay lexicons were seen to be 0.778, 0.838 and 0.686, respectively.
format Thesis
author Mohammed Salem , Abdullah Kaity
author_facet Mohammed Salem , Abdullah Kaity
author_sort Mohammed Salem , Abdullah Kaity
title A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity
title_short A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity
title_full A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity
title_fullStr A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity
title_full_unstemmed A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity
title_sort semi-automatic integrated framework for non-english sentiment lexicons / mohammed salem abdullah kaity
publishDate 2020
url http://studentsrepo.um.edu.my/14485/1/Mohammed_Salem.pdf
http://studentsrepo.um.edu.my/14485/2/Mohammed_Salem.pdf
http://studentsrepo.um.edu.my/14485/
_version_ 1769842915750707200
score 13.211869