Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh

The evolution of social media platforms has created a niche for users to increasingly turn to such sites in order to share and exchange health related information. Facebook being one of the largest social networking sites has only encouraged such exchange thus mounting to a sheer amount of data that...

Full description

Saved in:
Bibliographic Details
Main Author: Wandeep Kaur , Ratan Singh
Format: Thesis
Published: 2020
Subjects:
Online Access:http://studentsrepo.um.edu.my/14824/1/Wandeep_Kaur.pdf
http://studentsrepo.um.edu.my/14824/2/Wandeep_Kaur.pdf
http://studentsrepo.um.edu.my/14824/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1831436497373364224
author Wandeep Kaur , Ratan Singh
author_facet Wandeep Kaur , Ratan Singh
author_sort Wandeep Kaur , Ratan Singh
building UM Library
collection Institutional Repository
content_provider Universiti Malaya
content_source UM Student Repository
continent Asia
country Malaysia
description The evolution of social media platforms has created a niche for users to increasingly turn to such sites in order to share and exchange health related information. Facebook being one of the largest social networking sites has only encouraged such exchange thus mounting to a sheer amount of data that is hidden within unstructured text. The aim of this research is to propose a multi-tier classification based on sentiment, type, emotion and purpose (STEP) to classify data collected from diabetes community within Facebook. There are three tiers within the proposed STEP framework namely type, purpose and sentiment (and emotion within same tier). The first tier looks into the classification of type of diabetes. Here a manual type lexicon dictionary catering for all three forms of diabetes (type1, type 2 and gestational diabetes) was created. Naïve Bayes using n-gram was used for classification purpose where the proposed STEP framework was able to produce a F1-Score of 77% against benchmark models. Posts that could not be classified into any one type were grouped under Other while the correctly classified posts from this tier moved down to the next tier for purpose classification. In the next tier, posts were classified according to symptoms, lifestyle and treatment. A weighted information gain feature selection technique was adopted where weights were redistributed for those features that have been wrongly classified within the training phase. Co-training multinomial Naïve Bayes was used where the two base classifiers were used for both label and feature classification. The uniqueness lies in using dimensionality reduction technique of converting numeric vectors to string vectors using Word2Vec that improved F1-Score of 61% compared to only 48%. The last tier in the proposed STEP framework looked into sentiment and emotion classification. Here a mathematical equation was proposed to calculate sentiment intensity using Facebook behaviors of like, comment, share and reaction. Studies in the past have looked to analyze the use of this behaviors and how they impact sales, however, the attempt made in this research is to convert those numbers to intensity which could be used to better classify sentiment. Results show proposed sentiment classifier was able to produce better classification of F1-Score 84%. Emotion classification was also conducted within the same tier where Word2Vec common bag of words model was adopted using bootstrapping methodology. A similarity check between annotated corpus and Emolex determined the dominant emotion and thus classified post accordingly. This improved the classification process from detecting multiple emotion per post to classifying the most dominant emotion extracted from post. The proposed framework was able to improve overall classification accuracy within each of its tiers and using a multi-tier framework, it was able to remove posts that do not contribute towards classification within the upper layers thus contributing to a more refined dataset for classification within its lower tiers. Keywords:
format Thesis
id my.um.stud-14824
institution Universiti Malaya
publishDate 2020
record_format eprints
spelling my.um.stud-148242024-02-18T23:28:18Z Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh Wandeep Kaur , Ratan Singh QA75 Electronic computers. Computer science ZA4050 Electronic information resources The evolution of social media platforms has created a niche for users to increasingly turn to such sites in order to share and exchange health related information. Facebook being one of the largest social networking sites has only encouraged such exchange thus mounting to a sheer amount of data that is hidden within unstructured text. The aim of this research is to propose a multi-tier classification based on sentiment, type, emotion and purpose (STEP) to classify data collected from diabetes community within Facebook. There are three tiers within the proposed STEP framework namely type, purpose and sentiment (and emotion within same tier). The first tier looks into the classification of type of diabetes. Here a manual type lexicon dictionary catering for all three forms of diabetes (type1, type 2 and gestational diabetes) was created. Naïve Bayes using n-gram was used for classification purpose where the proposed STEP framework was able to produce a F1-Score of 77% against benchmark models. Posts that could not be classified into any one type were grouped under Other while the correctly classified posts from this tier moved down to the next tier for purpose classification. In the next tier, posts were classified according to symptoms, lifestyle and treatment. A weighted information gain feature selection technique was adopted where weights were redistributed for those features that have been wrongly classified within the training phase. Co-training multinomial Naïve Bayes was used where the two base classifiers were used for both label and feature classification. The uniqueness lies in using dimensionality reduction technique of converting numeric vectors to string vectors using Word2Vec that improved F1-Score of 61% compared to only 48%. The last tier in the proposed STEP framework looked into sentiment and emotion classification. Here a mathematical equation was proposed to calculate sentiment intensity using Facebook behaviors of like, comment, share and reaction. Studies in the past have looked to analyze the use of this behaviors and how they impact sales, however, the attempt made in this research is to convert those numbers to intensity which could be used to better classify sentiment. Results show proposed sentiment classifier was able to produce better classification of F1-Score 84%. Emotion classification was also conducted within the same tier where Word2Vec common bag of words model was adopted using bootstrapping methodology. A similarity check between annotated corpus and Emolex determined the dominant emotion and thus classified post accordingly. This improved the classification process from detecting multiple emotion per post to classifying the most dominant emotion extracted from post. The proposed framework was able to improve overall classification accuracy within each of its tiers and using a multi-tier framework, it was able to remove posts that do not contribute towards classification within the upper layers thus contributing to a more refined dataset for classification within its lower tiers. Keywords: 2020-04 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14824/1/Wandeep_Kaur.pdf application/pdf http://studentsrepo.um.edu.my/14824/2/Wandeep_Kaur.pdf Wandeep Kaur , Ratan Singh (2020) Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14824/
spellingShingle QA75 Electronic computers. Computer science
ZA4050 Electronic information resources
Wandeep Kaur , Ratan Singh
Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh
title Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh
title_full Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh
title_fullStr Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh
title_full_unstemmed Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh
title_short Multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / Wandeep Kaur Ratan Singh
title_sort multi-tier classification based on sentiment, type, emotion and purpose for online diabetes community / wandeep kaur ratan singh
topic QA75 Electronic computers. Computer science
ZA4050 Electronic information resources
url http://studentsrepo.um.edu.my/14824/1/Wandeep_Kaur.pdf
http://studentsrepo.um.edu.my/14824/2/Wandeep_Kaur.pdf
http://studentsrepo.um.edu.my/14824/
url_provider http://studentsrepo.um.edu.my/