A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity
There has been significant growth in social media networks in the last few years. Posting opinions and messages on social networking websites has become a popular activity on the Internet. The data sources are necessary for business intelligence and market analytics, as human opinions form a major i...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2020
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/14485/1/Mohammed_Salem.pdf http://studentsrepo.um.edu.my/14485/2/Mohammed_Salem.pdf http://studentsrepo.um.edu.my/14485/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.stud.14485 |
---|---|
record_format |
eprints |
spelling |
my.um.stud.144852023-06-21T22:54:00Z A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity Mohammed Salem , Abdullah Kaity QA75 Electronic computers. Computer science QA76 Computer software There has been significant growth in social media networks in the last few years. Posting opinions and messages on social networking websites has become a popular activity on the Internet. The data sources are necessary for business intelligence and market analytics, as human opinions form a major indicator of human desires and behaviour. This has resulted in the development of a new study field called sentiment analysis. This includes the analysis, evaluation and interpretation of the opinions with the help of text mining and Natural Language Processing (NLP) processes, for identifying the text polarity, as positive, neutral or negative. It is important to build sentiment analysis resources before developing the sentiment analysis models. The sentiment lexicons are seen to be a major resource which includes a list of phrases and opinion words along with their sentiment orientation. Literature review revealed that though many texts are available which are written in different languages, a majority of the sentiment analysis studies have focused on those written in English. Hence, the other non-English languages noted a shortage of lexicons and resources. Also, the techniques used for building the sentiment lexicons in non-English languages display many disadvantages like their inability to handle a particular domain, informal use of language expression and vocabulary used in the social media feeds. Furthermore, a few of the non-English sentiment lexicons also have to face translation issues and are plagued by the cultural difference when they are translated from different languages. To overcome the issues which are noted while building the non-English lexicons, a language-independent integrated framework has been proposed in this work which semi-automatically builds the non-English sentiment lexicons based on the available English lexicons with an unannotated corpus from the target language. This framework includes three layers, i.e., corpus-based, lexicon-based, and human-based. The first two layers can automatically recognise and then extract the novel polarity words from the huge unannotated corpus, with the help of the initial seed lexicons. The major advantage of this framework is that it needs only an initial seed lexicon and an unannotated corpus for initiating the extraction activity. This framework is seen to be semi-supervised owing to the usage of the seed lexicons. Experiments on three languages have been carried out and the proposed framework output has shown a better performance than the existing lexicons. The F-measure values for the Arabic, French and Malay lexicons were seen to be 0.778, 0.838 and 0.686, respectively. 2020-04 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14485/1/Mohammed_Salem.pdf application/pdf http://studentsrepo.um.edu.my/14485/2/Mohammed_Salem.pdf Mohammed Salem , Abdullah Kaity (2020) A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14485/ |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Student Repository |
url_provider |
http://studentsrepo.um.edu.my/ |
topic |
QA75 Electronic computers. Computer science QA76 Computer software |
spellingShingle |
QA75 Electronic computers. Computer science QA76 Computer software Mohammed Salem , Abdullah Kaity A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity |
description |
There has been significant growth in social media networks in the last few years. Posting opinions and messages on social networking websites has become a popular activity on the Internet. The data sources are necessary for business intelligence and market analytics, as human opinions form a major indicator of human desires and behaviour. This has resulted in the development of a new study field called sentiment analysis. This includes the analysis, evaluation and interpretation of the opinions with the help of text mining and Natural Language Processing (NLP) processes, for identifying the text polarity, as positive, neutral or negative. It is important to build sentiment analysis resources before developing the sentiment analysis models. The sentiment lexicons are seen to be a major resource which includes a list of phrases and opinion words along with their sentiment orientation. Literature review revealed that though many texts are available which are written in different languages, a majority of the sentiment analysis studies have focused on those written in English. Hence, the other non-English languages noted a shortage of lexicons and resources. Also, the techniques used for building the sentiment lexicons in non-English languages display many disadvantages like their inability to handle a particular domain, informal use of language expression and vocabulary used in the social media feeds. Furthermore, a few of the non-English sentiment lexicons also have to face translation issues and are plagued by the cultural difference when they are translated from different languages. To overcome the issues which are noted while building the non-English lexicons, a language-independent integrated framework has been proposed in this work which semi-automatically builds the non-English sentiment lexicons based on the available English lexicons with an unannotated corpus from the target language. This framework includes three layers, i.e., corpus-based, lexicon-based, and human-based. The first two layers can automatically recognise and then extract the novel polarity words from the huge unannotated corpus, with the help of the initial seed lexicons. The major advantage of this framework is that it needs only an initial seed lexicon and an unannotated corpus for initiating the extraction activity. This framework is seen to be semi-supervised owing to the usage of the seed lexicons. Experiments on three languages have been carried out and the proposed framework output has shown a better performance than the existing lexicons. The F-measure values for the Arabic, French and Malay lexicons were seen to be 0.778, 0.838 and 0.686, respectively.
|
format |
Thesis |
author |
Mohammed Salem , Abdullah Kaity |
author_facet |
Mohammed Salem , Abdullah Kaity |
author_sort |
Mohammed Salem , Abdullah Kaity |
title |
A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity |
title_short |
A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity |
title_full |
A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity |
title_fullStr |
A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity |
title_full_unstemmed |
A semi-automatic integrated framework for non-English sentiment lexicons / Mohammed Salem Abdullah Kaity |
title_sort |
semi-automatic integrated framework for non-english sentiment lexicons / mohammed salem abdullah kaity |
publishDate |
2020 |
url |
http://studentsrepo.um.edu.my/14485/1/Mohammed_Salem.pdf http://studentsrepo.um.edu.my/14485/2/Mohammed_Salem.pdf http://studentsrepo.um.edu.my/14485/ |
_version_ |
1769842915750707200 |
score |
13.211869 |