Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique
Vast amounts of text documents are available in various fields. The accumulations of available text documents have raised new challenges for information retrieval (IR) technology. Therefore, in order to facilitate the knowledge management process, various approaches and techniques applied on text c...
Saved in:
Main Authors: | , , |
---|---|
Format: | Research Report |
Language: | English |
Published: |
Universiti Malaysia Sabah
2011
|
Online Access: | https://eprints.ums.edu.my/id/eprint/22889/1/Enhancing%20knowledge%20management%20by%20developing%20an%20automated%20document%20Labelling%20based%20on%20concepts%20aggregation%20using%20Hac%20technique.pdf https://eprints.ums.edu.my/id/eprint/22889/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.ums.eprints.22889 |
---|---|
record_format |
eprints |
spelling |
my.ums.eprints.228892019-07-22T04:27:50Z https://eprints.ums.edu.my/id/eprint/22889/ Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique Rayner Alfred Leau, Yu Beng Tan, Soo Fun Vast amounts of text documents are available in various fields. The accumulations of available text documents have raised new challenges for information retrieval (IR) technology. Therefore, in order to facilitate the knowledge management process, various approaches and techniques applied on text classification (categorization) and text clustering are being compared and studied. The most common way to organize and label documents is to group similar documents into clusters by clustering them and then extract concepts that characterize each cluster. Normally, the assumed number of clusters may be unreliable since the nature of the grouping structures among the data is unknown before processing and thus the partitioning methods would not predict the structures of the data very well. Hierarchical clustering has been chosen to solve this problem by which they provide data-views at different levels of abstraction, making them ideal for people to visualize the concepts generated and interactively explore large document collections. Another problem that needs to be considered is the appropriate method of combining two different clusters to form a single cluster. In order to perform this task, various distance methods will be studied in order to cluster documents by using the hierarchical agglomerative clustering. Clusters very often include sub-clusters, and the hierarchical structure is indeed a natural constraint on the underlying application domain. In order to manage and organize documents effectively, similar documents will be merged to form clusters. Each document is represented by one or more concepts. The goal of this project is to generate concepts that characterize English documents by using the hierarchical agglomerative clustering. One of the advantages of using hierarchical clustering is that the overlapping clusters can be formed and concepts can be generated based on the contents of each cluster. Besides that, different distance measures will be used in order to investigate the quality of clusters produced. Universiti Malaysia Sabah 2011 Research Report NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/22889/1/Enhancing%20knowledge%20management%20by%20developing%20an%20automated%20document%20Labelling%20based%20on%20concepts%20aggregation%20using%20Hac%20technique.pdf Rayner Alfred and Leau, Yu Beng and Tan, Soo Fun (2011) Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique. (Unpublished) |
institution |
Universiti Malaysia Sabah |
building |
UMS Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sabah |
content_source |
UMS Institutional Repository |
url_provider |
http://eprints.ums.edu.my/ |
language |
English |
description |
Vast amounts of text documents are available in various fields. The accumulations of available text documents have raised new challenges for information retrieval (IR) technology. Therefore, in order to facilitate the knowledge management process, various approaches and techniques
applied on text classification (categorization) and text clustering are being compared and studied. The most common way to organize and label documents is to group similar documents into clusters by clustering them and then extract concepts that characterize each cluster. Normally, the assumed number of clusters may be unreliable since the nature of the grouping structures among the data is unknown before processing and thus the partitioning methods would not
predict the structures of the data very well. Hierarchical clustering has been chosen to solve this problem by which they provide data-views at different levels of abstraction, making them ideal
for people to visualize the concepts generated and interactively explore large document
collections. Another problem that needs to be considered is the appropriate method of combining
two different clusters to form a single cluster. In order to perform this task, various distance
methods will be studied in order to cluster documents by using the hierarchical agglomerative
clustering. Clusters very often include sub-clusters, and the hierarchical structure is indeed a
natural constraint on the underlying application domain. In order to manage and organize
documents effectively, similar documents will be merged to form clusters. Each document is
represented by one or more concepts. The goal of this project is to generate concepts that
characterize English documents by using the hierarchical agglomerative clustering. One of the
advantages of using hierarchical clustering is that the overlapping clusters can be formed and
concepts can be generated based on the contents of each cluster. Besides that, different distance
measures will be used in order to investigate the quality of clusters produced. |
format |
Research Report |
author |
Rayner Alfred Leau, Yu Beng Tan, Soo Fun |
spellingShingle |
Rayner Alfred Leau, Yu Beng Tan, Soo Fun Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique |
author_facet |
Rayner Alfred Leau, Yu Beng Tan, Soo Fun |
author_sort |
Rayner Alfred |
title |
Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique |
title_short |
Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique |
title_full |
Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique |
title_fullStr |
Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique |
title_full_unstemmed |
Enhancing knowledge management by developing an automated document Labelling based on concepts aggregation using Hac technique |
title_sort |
enhancing knowledge management by developing an automated document labelling based on concepts aggregation using hac technique |
publisher |
Universiti Malaysia Sabah |
publishDate |
2011 |
url |
https://eprints.ums.edu.my/id/eprint/22889/1/Enhancing%20knowledge%20management%20by%20developing%20an%20automated%20document%20Labelling%20based%20on%20concepts%20aggregation%20using%20Hac%20technique.pdf https://eprints.ums.edu.my/id/eprint/22889/ |
_version_ |
1760230030495973376 |
score |
13.222552 |