An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter

Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words...

Full description

Saved in:
Bibliographic Details
Main Author: Ihnaini, Baha' Najim Salman
Format: Thesis
Language:en
en
en
Published: 2019
Subjects:
Online Access:https://etd.uum.edu.my/8699/1/s900147_01.pdf
https://etd.uum.edu.my/8699/2/s900147_02.pdf
https://etd.uum.edu.my/8699/3/s900147_references.docx
https://etd.uum.edu.my/8699/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1833437058012545024
author Ihnaini, Baha' Najim Salman
author_facet Ihnaini, Baha' Najim Salman
author_sort Ihnaini, Baha' Najim Salman
building UUM Library
collection Institutional Repository
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
continent Asia
country Malaysia
description Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words, hence decreases the performance of sentiment analysis models when applied on tweets collected, and c) handling valence shifter words were not thoroughly addressed in Arabic sentiment analysis. Therefore, this study aims to construct a PAL lexicon for Palestinian tweets and to design an Expandable and Up-to-date Lexicon for Arabic (EULA). A new valence shifter rules in enhancing the performance of lexicon-based sentiment analysis on Arabic tweets is also been constructed. In this study, a PAL lexicon is built by using phonology matching algorithm while EULA is constructed by harnessing a general lexicon on a tweets dataset to find new terms and predict its polarity through some linguistic rules. Furthermore, a set of rules are proposed to handle the valence shifters words by applying rules to find the scope of words, and shifting value that is produced by these words. Palestinian and Arabic tweets datasets from March to May 2018 are used to evaluate the proposed idea. Experimental results indicate that the proposed PAL lexicon has produced better results compared to other lexicons when tested on Palestinian dataset. Meanwhile, EULA enhanced the performance of lexicon-based approach to be competitive with machine learning approach. Moreover, applying the proposed valence shifter rules have increased overall performance of 5% on average. The new proposed PAL sentiment lexicon is able to handle Palestinian’s dialects. Furthermore, the EULA has overcome the emergence of new slang words in social media. Moreover, the constructed valence shifter rules are capable to handle negation, intensifiers and contrasts in enhancing the performance of Arabic sentiment analysis.
format Thesis
id my.uum.etd-8699
institution Universiti Utara Malaysia
language en
en
en
publishDate 2019
record_format eprints
spelling my.uum.etd-86992022-02-16T02:08:35Z https://etd.uum.edu.my/8699/ An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter Ihnaini, Baha' Najim Salman T Technology (General) Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words, hence decreases the performance of sentiment analysis models when applied on tweets collected, and c) handling valence shifter words were not thoroughly addressed in Arabic sentiment analysis. Therefore, this study aims to construct a PAL lexicon for Palestinian tweets and to design an Expandable and Up-to-date Lexicon for Arabic (EULA). A new valence shifter rules in enhancing the performance of lexicon-based sentiment analysis on Arabic tweets is also been constructed. In this study, a PAL lexicon is built by using phonology matching algorithm while EULA is constructed by harnessing a general lexicon on a tweets dataset to find new terms and predict its polarity through some linguistic rules. Furthermore, a set of rules are proposed to handle the valence shifters words by applying rules to find the scope of words, and shifting value that is produced by these words. Palestinian and Arabic tweets datasets from March to May 2018 are used to evaluate the proposed idea. Experimental results indicate that the proposed PAL lexicon has produced better results compared to other lexicons when tested on Palestinian dataset. Meanwhile, EULA enhanced the performance of lexicon-based approach to be competitive with machine learning approach. Moreover, applying the proposed valence shifter rules have increased overall performance of 5% on average. The new proposed PAL sentiment lexicon is able to handle Palestinian’s dialects. Furthermore, the EULA has overcome the emergence of new slang words in social media. Moreover, the constructed valence shifter rules are capable to handle negation, intensifiers and contrasts in enhancing the performance of Arabic sentiment analysis. 2019 Thesis NonPeerReviewed text en https://etd.uum.edu.my/8699/1/s900147_01.pdf text en https://etd.uum.edu.my/8699/2/s900147_02.pdf text en https://etd.uum.edu.my/8699/3/s900147_references.docx Ihnaini, Baha' Najim Salman (2019) An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter. Doctoral thesis, Universiti Utara Malaysia.
spellingShingle T Technology (General)
Ihnaini, Baha' Najim Salman
An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter
title An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter
title_full An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter
title_fullStr An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter
title_full_unstemmed An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter
title_short An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter
title_sort expandable arabic lexicon and valence shifter rules for sentiment analysis on twitter
topic T Technology (General)
url https://etd.uum.edu.my/8699/1/s900147_01.pdf
https://etd.uum.edu.my/8699/2/s900147_02.pdf
https://etd.uum.edu.my/8699/3/s900147_references.docx
https://etd.uum.edu.my/8699/
url_provider http://etd.uum.edu.my/