Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique

Vast amounts of text documents are available in various fields. The accumulations of available text documents have raised new challenges for information retrieval (IR) technology. Therefore, in order to facilitate the knowledge management process, various approaches and techniques applied on text c...

Full description

Saved in:
Bibliographic Details
Main Authors: Rayner Alfred, Leau, Yu Beng, Tan, Soo Fun
Format: Research Report
Language:English
Published: Universiti Malaysia Sabah 2011
Online Access:https://eprints.ums.edu.my/id/eprint/22889/1/Enhancing%20knowledge%20management%20by%20developing%20an%20automated%20document%20Labelling%20based%20on%20concepts%20aggregation%20using%20Hac%20technique.pdf
https://eprints.ums.edu.my/id/eprint/22889/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ums.eprints.22889
record_format eprints
spelling my.ums.eprints.228892019-07-22T04:27:50Z https://eprints.ums.edu.my/id/eprint/22889/ Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique Rayner Alfred Leau, Yu Beng Tan, Soo Fun Vast amounts of text documents are available in various fields. The accumulations of available text documents have raised new challenges for information retrieval (IR) technology. Therefore, in order to facilitate the knowledge management process, various approaches and techniques applied on text classification (categorization) and text clustering are being compared and studied. The most common way to organize and label documents is to group similar documents into clusters by clustering them and then extract concepts that characterize each cluster. Normally, the assumed number of clusters may be unreliable since the nature of the grouping structures among the data is unknown before processing and thus the partitioning methods would not predict the structures of the data very well. Hierarchical clustering has been chosen to solve this problem by which they provide data-views at different levels of abstraction, making them ideal for people to visualize the concepts generated and interactively explore large document collections. Another problem that needs to be considered is the appropriate method of combining two different clusters to form a single cluster. In order to perform this task, various distance methods will be studied in order to cluster documents by using the hierarchical agglomerative clustering. Clusters very often include sub-clusters, and the hierarchical structure is indeed a natural constraint on the underlying application domain. In order to manage and organize documents effectively, similar documents will be merged to form clusters. Each document is represented by one or more concepts. The goal of this project is to generate concepts that characterize English documents by using the hierarchical agglomerative clustering. One of the advantages of using hierarchical clustering is that the overlapping clusters can be formed and concepts can be generated based on the contents of each cluster. Besides that, different distance measures will be used in order to investigate the quality of clusters produced. Universiti Malaysia Sabah 2011 Research Report NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/22889/1/Enhancing%20knowledge%20management%20by%20developing%20an%20automated%20document%20Labelling%20based%20on%20concepts%20aggregation%20using%20Hac%20technique.pdf Rayner Alfred and Leau, Yu Beng and Tan, Soo Fun (2011) Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique. (Unpublished)
institution Universiti Malaysia Sabah
building UMS Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sabah
content_source UMS Institutional Repository
url_provider http://eprints.ums.edu.my/
language English
description Vast amounts of text documents are available in various fields. The accumulations of available text documents have raised new challenges for information retrieval (IR) technology. Therefore, in order to facilitate the knowledge management process, various approaches and techniques applied on text classification (categorization) and text clustering are being compared and studied. The most common way to organize and label documents is to group similar documents into clusters by clustering them and then extract concepts that characterize each cluster. Normally, the assumed number of clusters may be unreliable since the nature of the grouping structures among the data is unknown before processing and thus the partitioning methods would not predict the structures of the data very well. Hierarchical clustering has been chosen to solve this problem by which they provide data-views at different levels of abstraction, making them ideal for people to visualize the concepts generated and interactively explore large document collections. Another problem that needs to be considered is the appropriate method of combining two different clusters to form a single cluster. In order to perform this task, various distance methods will be studied in order to cluster documents by using the hierarchical agglomerative clustering. Clusters very often include sub-clusters, and the hierarchical structure is indeed a natural constraint on the underlying application domain. In order to manage and organize documents effectively, similar documents will be merged to form clusters. Each document is represented by one or more concepts. The goal of this project is to generate concepts that characterize English documents by using the hierarchical agglomerative clustering. One of the advantages of using hierarchical clustering is that the overlapping clusters can be formed and concepts can be generated based on the contents of each cluster. Besides that, different distance measures will be used in order to investigate the quality of clusters produced.
format Research Report
author Rayner Alfred
Leau, Yu Beng
Tan, Soo Fun
spellingShingle Rayner Alfred
Leau, Yu Beng
Tan, Soo Fun
Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique
author_facet Rayner Alfred
Leau, Yu Beng
Tan, Soo Fun
author_sort Rayner Alfred
title Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique
title_short Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique
title_full Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique
title_fullStr Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique
title_full_unstemmed Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique
title_sort enhancing knowledge management by developing an automated document labelling based on concepts aggregation using hac technique
publisher Universiti Malaysia Sabah
publishDate 2011
url https://eprints.ums.edu.my/id/eprint/22889/1/Enhancing%20knowledge%20management%20by%20developing%20an%20automated%20document%20Labelling%20based%20on%20concepts%20aggregation%20using%20Hac%20technique.pdf
https://eprints.ums.edu.my/id/eprint/22889/
_version_ 1760230030495973376
score 13.222552