Adaptive firefly algorithm for hierarchical text clustering

Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowle...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammed, Athraa Jasim
Format: Thesis
Language:en
en
Published: 2016
Subjects:
Online Access:https://etd.uum.edu.my/5801/1/s94734_02.pdf
https://etd.uum.edu.my/5801/2/s94734_01.pdf
https://etd.uum.edu.my/5801/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1833436576317702144
author Mohammed, Athraa Jasim
author_facet Mohammed, Athraa Jasim
author_sort Mohammed, Athraa Jasim
building UUM Library
collection Institutional Repository
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
continent Asia
country Malaysia
description Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes.
format Thesis
id my.uum.etd-5801
institution Universiti Utara Malaysia
language en
en
publishDate 2016
record_format eprints
spelling my.uum.etd-58012021-04-06T06:33:01Z https://etd.uum.edu.my/5801/ Adaptive firefly algorithm for hierarchical text clustering Mohammed, Athraa Jasim QA75 Electronic computers. Computer science Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes. 2016 Thesis NonPeerReviewed text en https://etd.uum.edu.my/5801/1/s94734_02.pdf text en https://etd.uum.edu.my/5801/2/s94734_01.pdf Mohammed, Athraa Jasim (2016) Adaptive firefly algorithm for hierarchical text clustering. PhD. thesis, Universiti Utara Malaysia.
spellingShingle QA75 Electronic computers. Computer science
Mohammed, Athraa Jasim
Adaptive firefly algorithm for hierarchical text clustering
title Adaptive firefly algorithm for hierarchical text clustering
title_full Adaptive firefly algorithm for hierarchical text clustering
title_fullStr Adaptive firefly algorithm for hierarchical text clustering
title_full_unstemmed Adaptive firefly algorithm for hierarchical text clustering
title_short Adaptive firefly algorithm for hierarchical text clustering
title_sort adaptive firefly algorithm for hierarchical text clustering
topic QA75 Electronic computers. Computer science
url https://etd.uum.edu.my/5801/1/s94734_02.pdf
https://etd.uum.edu.my/5801/2/s94734_01.pdf
https://etd.uum.edu.my/5801/
url_provider http://etd.uum.edu.my/