Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
The wide availability of huge collections of text documents (news corpora, e-mails, web pages, scientific articles and etc) has fostered the need for efficient text mining tools. Information retrieval, text filtering and classification, and information extraction technologies are rapidly becoming ke...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Research Report |
Language: | English |
Published: |
Universiti Malaysia Sabah
2011
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/24836/1/Development%20of%20a%20text%20analyzer%20for%20automatic%20categorization%20of%20texts%20documents%20based%20on%20interactive%20visualization%20approach.pdf https://eprints.ums.edu.my/id/eprint/24836/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.ums.eprints.24836 |
---|---|
record_format |
eprints |
spelling |
my.ums.eprints.248362020-02-03T02:45:16Z https://eprints.ums.edu.my/id/eprint/24836/ Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach Mohd Norhisham Razali Rayner Alfred Suraya Alias Asni Tahir QA Mathematics The wide availability of huge collections of text documents (news corpora, e-mails, web pages, scientific articles and etc) has fostered the need for efficient text mining tools. Information retrieval, text filtering and classification, and information extraction technologies are rapidly becoming key components of modem information processing systems, helping end-users to select, visualize and shape their informational environment. The ability to visualize documents into clusters is very essential. The best data summarization technique could be used to summarize data but a poor representation or visualization of it will be totally misleading. As proposed in many researches, clustering techniques are applied and the results are produced when documents are grouped in clusters. However, in some cases, user may want to know the relationship that exists between clusters. In order to illustrate relationships that exist between clusters, a hierarchical agglomerative clustering technique can be applied to build the dendogram. This dendogram display the relationship between a cluster and its sub-clusters. For this reason, user will be able to view the relationship that exists between clusters. In addition to that, the terms or features that characterize each cluster can also be displayed to assist user in understanding the contents of whole text documents that stored in the database. In this research a Text Analyzer (Visual Text) that automates the categorization of text documents based on a visualization approach using the Hierarchical Agglomerative Clustering technique will be proposed. With Visual Text, users are able to analyze and categorize text documents automatically, visualize the overall structure of their informational environment by visualizing each cluster and its sub-clusters, identify words or terms used to categorize each cluster and its sub-cluster and finally evaluate the quality of the text categorization based on the distance· method. The proposed tool is potentially very useful for analyzing text documents automatically for summarization purposes and thus facilitates decision making process. Universiti Malaysia Sabah 2011 Research Report NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/24836/1/Development%20of%20a%20text%20analyzer%20for%20automatic%20categorization%20of%20texts%20documents%20based%20on%20interactive%20visualization%20approach.pdf Mohd Norhisham Razali and Rayner Alfred and Suraya Alias and Asni Tahir (2011) Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach. |
institution |
Universiti Malaysia Sabah |
building |
UMS Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sabah |
content_source |
UMS Institutional Repository |
url_provider |
http://eprints.ums.edu.my/ |
language |
English |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Mohd Norhisham Razali Rayner Alfred Suraya Alias Asni Tahir Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach |
description |
The wide availability of huge collections of text documents (news corpora, e-mails, web pages, scientific articles and etc) has fostered the need for efficient text mining tools. Information retrieval, text filtering and classification, and information extraction technologies are rapidly becoming key components of modem information processing systems, helping end-users to select, visualize and shape their informational environment. The ability to visualize documents into clusters is very essential. The best data summarization technique could be used to summarize data but a poor representation or visualization of it will be totally misleading. As proposed in many researches, clustering techniques are applied and the results are produced when documents are grouped in clusters. However, in some cases, user may want to know the relationship that exists between clusters. In order to illustrate relationships that exist between clusters, a hierarchical agglomerative clustering technique can be applied to build the dendogram. This dendogram display the relationship between a cluster and its sub-clusters. For this reason, user will be able to view the relationship that exists between clusters. In addition to that, the terms or features that characterize each cluster can also be displayed to assist user in understanding the contents of whole text documents that stored in the database. In this research a Text Analyzer (Visual Text) that automates the categorization of text documents based on a visualization approach using the Hierarchical Agglomerative Clustering technique will be proposed. With Visual Text, users are able to analyze and categorize text documents automatically, visualize the overall structure of their informational environment by visualizing each cluster and its sub-clusters, identify words or terms used to categorize each cluster and its sub-cluster and finally evaluate the quality of the text categorization based on the distance· method. The proposed tool is potentially very useful for analyzing text documents automatically for summarization purposes and thus facilitates decision making process. |
format |
Research Report |
author |
Mohd Norhisham Razali Rayner Alfred Suraya Alias Asni Tahir |
author_facet |
Mohd Norhisham Razali Rayner Alfred Suraya Alias Asni Tahir |
author_sort |
Mohd Norhisham Razali |
title |
Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach |
title_short |
Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach |
title_full |
Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach |
title_fullStr |
Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach |
title_full_unstemmed |
Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach |
title_sort |
development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach |
publisher |
Universiti Malaysia Sabah |
publishDate |
2011 |
url |
https://eprints.ums.edu.my/id/eprint/24836/1/Development%20of%20a%20text%20analyzer%20for%20automatic%20categorization%20of%20texts%20documents%20based%20on%20interactive%20visualization%20approach.pdf https://eprints.ums.edu.my/id/eprint/24836/ |
_version_ |
1760230288951083008 |
score |
13.211869 |