Document clustering using hybrid lda- kmeans

This paper presents a Hybrid Latent Dirichlet Allocation � Kmeans (HLDA-Kmeans) Algorithm for document clustering. The overload information has became a challenge for users due to the existence of abundance information and heterogeneous nature of the Web. Researchers such as academician as well as...

全面介绍

Saved in:
书目详细资料
Main Authors: Foong, O.-M., Ismail, A.N.
格式: Article
出版: Springer 2020
在线阅读:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85089621371&doi=10.1007%2f978-3-030-51974-2_12&partnerID=40&md5=5e0b63bc12c5aa95703ac2dc96dfea7f
http://eprints.utp.edu.my/24729/
标签: 添加标签
没有标签, 成为第一个标记此记录!
id my.utp.eprints.24729
record_format eprints
spelling my.utp.eprints.247292021-08-27T05:50:45Z Document clustering using hybrid lda- kmeans Foong, O.-M. Ismail, A.N. This paper presents a Hybrid Latent Dirichlet Allocation � Kmeans (HLDA-Kmeans) Algorithm for document clustering. The overload information has became a challenge for users due to the existence of abundance information and heterogeneous nature of the Web. Researchers such as academician as well as people who are involved in text analytics have encountered challenges to analyze documents because of ambiguity in keywords/keyphrases. Hence, the objective is to perform document clustering analysis using HLDA - Kmeans algorithm to discover the clusters among the unlabelled text data, classify the keyphrases based on topics and visualize the clustering results. Online news from Oil and Gas is used as a dataset for training and testing using 70�30 split. The system performance of the proposed HLDA - Kmeans algorithm was assessed using Precision, Recall and F-Score Formulas. Experimental results show that the proposed HLDA - Kmeans has achieved clustering results satisfactorily. © Springer Nature Switzerland AG 2020. Springer 2020 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85089621371&doi=10.1007%2f978-3-030-51974-2_12&partnerID=40&md5=5e0b63bc12c5aa95703ac2dc96dfea7f Foong, O.-M. and Ismail, A.N. (2020) Document clustering using hybrid lda- kmeans. Advances in Intelligent Systems and Computing, 1226 A . pp. 137-146. http://eprints.utp.edu.my/24729/
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Institutional Repository
url_provider http://eprints.utp.edu.my/
description This paper presents a Hybrid Latent Dirichlet Allocation � Kmeans (HLDA-Kmeans) Algorithm for document clustering. The overload information has became a challenge for users due to the existence of abundance information and heterogeneous nature of the Web. Researchers such as academician as well as people who are involved in text analytics have encountered challenges to analyze documents because of ambiguity in keywords/keyphrases. Hence, the objective is to perform document clustering analysis using HLDA - Kmeans algorithm to discover the clusters among the unlabelled text data, classify the keyphrases based on topics and visualize the clustering results. Online news from Oil and Gas is used as a dataset for training and testing using 70�30 split. The system performance of the proposed HLDA - Kmeans algorithm was assessed using Precision, Recall and F-Score Formulas. Experimental results show that the proposed HLDA - Kmeans has achieved clustering results satisfactorily. © Springer Nature Switzerland AG 2020.
format Article
author Foong, O.-M.
Ismail, A.N.
spellingShingle Foong, O.-M.
Ismail, A.N.
Document clustering using hybrid lda- kmeans
author_facet Foong, O.-M.
Ismail, A.N.
author_sort Foong, O.-M.
title Document clustering using hybrid lda- kmeans
title_short Document clustering using hybrid lda- kmeans
title_full Document clustering using hybrid lda- kmeans
title_fullStr Document clustering using hybrid lda- kmeans
title_full_unstemmed Document clustering using hybrid lda- kmeans
title_sort document clustering using hybrid lda- kmeans
publisher Springer
publishDate 2020
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85089621371&doi=10.1007%2f978-3-030-51974-2_12&partnerID=40&md5=5e0b63bc12c5aa95703ac2dc96dfea7f
http://eprints.utp.edu.my/24729/
_version_ 1738656631244193792
score 13.250246