A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation

This paper proposes a hybrid Poisson-Gamma Latent Dirichlet Allocation (PGLDA) model designed for modelling word dependencies to accommodate the semantic representation of words. The new model simultaneously overcomes the shortcomings of complexity by using LDA as the baseline model as well as adequ...

Full description

Saved in:
Bibliographic Details
Main Authors: Bala, Ibrahim Bakari, Saringat, Mohd Zainuri, Mustapha, Aida
Format: Article
Language:English
Published: Little Lion Scientific 2020
Subjects:
Online Access:http://eprints.uthm.edu.my/6132/1/AJ%202020%20%28203%29.pdf
http://eprints.uthm.edu.my/6132/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uthm.eprints.6132
record_format eprints
spelling my.uthm.eprints.61322022-01-26T08:41:58Z http://eprints.uthm.edu.my/6132/ A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation Bala, Ibrahim Bakari Saringat, Mohd Zainuri Mustapha, Aida TP Chemical technology This paper proposes a hybrid Poisson-Gamma Latent Dirichlet Allocation (PGLDA) model designed for modelling word dependencies to accommodate the semantic representation of words. The new model simultaneously overcomes the shortcomings of complexity by using LDA as the baseline model as well as adequately capturing the words contextual correlation. The Poisson document length distribution was replaced with the admixture of Poisson-Gamma for words correlation modelling when there is a hub word that connects words and topics. Furthermore, the distributed representation of documents (Doc2Vec) and topics (Topic2Vec) vectors are then averaged to form new vectors of words representation to be combined with topics with largest likelihood from PGLDA. Model estimation was achieved by combining the Laplacian approximation of log-likelihood for PGLDA and Feed-Forward Neural Network (FFN) approaches of Doc2Vec and Topic2Vec. The proposed hybrid method was evaluated for precision, recall, and F1 score based on 20 Newsgroups and AG’s News datasets. Comparative analysis of F1 score showed that the proposed hybrid model outperformed other methods. Little Lion Scientific 2020 Article PeerReviewed text en http://eprints.uthm.edu.my/6132/1/AJ%202020%20%28203%29.pdf Bala, Ibrahim Bakari and Saringat, Mohd Zainuri and Mustapha, Aida (2020) A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation. Journal of Theoretical and Applied Information Technology, 98 (9). pp. 1446-1456. ISSN 1992-8645
institution Universiti Tun Hussein Onn Malaysia
building UTHM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tun Hussein Onn Malaysia
content_source UTHM Institutional Repository
url_provider http://eprints.uthm.edu.my/
language English
topic TP Chemical technology
spellingShingle TP Chemical technology
Bala, Ibrahim Bakari
Saringat, Mohd Zainuri
Mustapha, Aida
A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation
description This paper proposes a hybrid Poisson-Gamma Latent Dirichlet Allocation (PGLDA) model designed for modelling word dependencies to accommodate the semantic representation of words. The new model simultaneously overcomes the shortcomings of complexity by using LDA as the baseline model as well as adequately capturing the words contextual correlation. The Poisson document length distribution was replaced with the admixture of Poisson-Gamma for words correlation modelling when there is a hub word that connects words and topics. Furthermore, the distributed representation of documents (Doc2Vec) and topics (Topic2Vec) vectors are then averaged to form new vectors of words representation to be combined with topics with largest likelihood from PGLDA. Model estimation was achieved by combining the Laplacian approximation of log-likelihood for PGLDA and Feed-Forward Neural Network (FFN) approaches of Doc2Vec and Topic2Vec. The proposed hybrid method was evaluated for precision, recall, and F1 score based on 20 Newsgroups and AG’s News datasets. Comparative analysis of F1 score showed that the proposed hybrid model outperformed other methods.
format Article
author Bala, Ibrahim Bakari
Saringat, Mohd Zainuri
Mustapha, Aida
author_facet Bala, Ibrahim Bakari
Saringat, Mohd Zainuri
Mustapha, Aida
author_sort Bala, Ibrahim Bakari
title A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation
title_short A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation
title_full A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation
title_fullStr A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation
title_full_unstemmed A hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation
title_sort hybrid word embedding model based on admixture of poisson-gamma latent dirichlet allocation model and distributed word-document- topic representation
publisher Little Lion Scientific
publishDate 2020
url http://eprints.uthm.edu.my/6132/1/AJ%202020%20%28203%29.pdf
http://eprints.uthm.edu.my/6132/
_version_ 1738581454118453248
score 13.211869