GF-CLUST: A nature-inspired algorithm for automatic text clustering

Text clustering is a task of grouping similar documents into a cluster while assigning the dissimilar ones in other clusters.A well-known clustering method which is the K-means algorithm is extensively employed in many disciplines.However, there is a big challenge to determine the number of clusters...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammed, Athraa Jasim, Yusof, Yuhanis, Husni, Husniza
Format: Article
Language:English
Published: Universiti Utara Malaysia 2016
Subjects:
Online Access:http://repo.uum.edu.my/18484/1/JICT%2015%201%20%202016%2057%E2%80%9381.pdf
http://repo.uum.edu.my/18484/
http://www.jict.uum.edu.my/images/pdf3/vol15no1/31jict1512016.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.18484
record_format eprints
spelling my.uum.repo.184842016-08-08T04:42:20Z http://repo.uum.edu.my/18484/ GF-CLUST: A nature-inspired algorithm for automatic text clustering Mohammed, Athraa Jasim Yusof, Yuhanis Husni, Husniza QA75 Electronic computers. Computer science Text clustering is a task of grouping similar documents into a cluster while assigning the dissimilar ones in other clusters.A well-known clustering method which is the K-means algorithm is extensively employed in many disciplines.However, there is a big challenge to determine the number of clusters using K-means. This paper presents a new clustering algorithm, termed Gravity Firefly Clustering (GF-CLUST) that utilizes Firefly Algorithm for dynamic document clustering. The GF-CLUST features the ability of identifying the appropriate number of clusters for a given text collection, which is a challenging problem in document clustering. It determines documents having strong force as centers and creates clusters based on cosine similarity measurement.This is followed by selecting potential clusters and merging small clusters to them. Experiments on various document datasets, such as 20 Newgroups, Reuters-21578 and TREC collection are conducted to evaluate the performance of the proposed GF-CLUST. The results of purity, F-measure and Entropy of GF-CLUST outperform the ones produced by existing clustering techniques, such as K-means, Particle Swarm Optimization (PSO) and Practical General Stochastic Clustering Method (pGSCM).Furthermore, the number of obtained clusters in GF-CLUST is near to the actual number of clusters as compared to pGSCM. Universiti Utara Malaysia 2016-06 Article PeerReviewed application/pdf en http://repo.uum.edu.my/18484/1/JICT%2015%201%20%202016%2057%E2%80%9381.pdf Mohammed, Athraa Jasim and Yusof, Yuhanis and Husni, Husniza (2016) GF-CLUST: A nature-inspired algorithm for automatic text clustering. Journal of Information and Communication Technology (JICT), 15 (1). pp. 57-81. ISSN 1675-414X http://www.jict.uum.edu.my/images/pdf3/vol15no1/31jict1512016.pdf
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Mohammed, Athraa Jasim
Yusof, Yuhanis
Husni, Husniza
GF-CLUST: A nature-inspired algorithm for automatic text clustering
description Text clustering is a task of grouping similar documents into a cluster while assigning the dissimilar ones in other clusters.A well-known clustering method which is the K-means algorithm is extensively employed in many disciplines.However, there is a big challenge to determine the number of clusters using K-means. This paper presents a new clustering algorithm, termed Gravity Firefly Clustering (GF-CLUST) that utilizes Firefly Algorithm for dynamic document clustering. The GF-CLUST features the ability of identifying the appropriate number of clusters for a given text collection, which is a challenging problem in document clustering. It determines documents having strong force as centers and creates clusters based on cosine similarity measurement.This is followed by selecting potential clusters and merging small clusters to them. Experiments on various document datasets, such as 20 Newgroups, Reuters-21578 and TREC collection are conducted to evaluate the performance of the proposed GF-CLUST. The results of purity, F-measure and Entropy of GF-CLUST outperform the ones produced by existing clustering techniques, such as K-means, Particle Swarm Optimization (PSO) and Practical General Stochastic Clustering Method (pGSCM).Furthermore, the number of obtained clusters in GF-CLUST is near to the actual number of clusters as compared to pGSCM.
format Article
author Mohammed, Athraa Jasim
Yusof, Yuhanis
Husni, Husniza
author_facet Mohammed, Athraa Jasim
Yusof, Yuhanis
Husni, Husniza
author_sort Mohammed, Athraa Jasim
title GF-CLUST: A nature-inspired algorithm for automatic text clustering
title_short GF-CLUST: A nature-inspired algorithm for automatic text clustering
title_full GF-CLUST: A nature-inspired algorithm for automatic text clustering
title_fullStr GF-CLUST: A nature-inspired algorithm for automatic text clustering
title_full_unstemmed GF-CLUST: A nature-inspired algorithm for automatic text clustering
title_sort gf-clust: a nature-inspired algorithm for automatic text clustering
publisher Universiti Utara Malaysia
publishDate 2016
url http://repo.uum.edu.my/18484/1/JICT%2015%201%20%202016%2057%E2%80%9381.pdf
http://repo.uum.edu.my/18484/
http://www.jict.uum.edu.my/images/pdf3/vol15no1/31jict1512016.pdf
_version_ 1644282466315272192
score 13.211869