Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach

The wide availability of huge collections of text documents (news corpora, e-mails, web pages, scientific articles and etc) has fostered the need for efficient text mining tools. Information retrieval, text filtering and classification, and information extraction technologies are rapidly becoming ke...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd Norhisham Razali, Rayner Alfred, Suraya Alias, Asni Tahir
Format: Research Report
Language:English
Published: Universiti Malaysia Sabah 2011
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/24836/1/Development%20of%20a%20text%20analyzer%20for%20automatic%20categorization%20of%20texts%20documents%20based%20on%20interactive%20visualization%20approach.pdf
https://eprints.ums.edu.my/id/eprint/24836/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ums.eprints.24836
record_format eprints
spelling my.ums.eprints.248362020-02-03T02:45:16Z https://eprints.ums.edu.my/id/eprint/24836/ Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach Mohd Norhisham Razali Rayner Alfred Suraya Alias Asni Tahir QA Mathematics The wide availability of huge collections of text documents (news corpora, e-mails, web pages, scientific articles and etc) has fostered the need for efficient text mining tools. Information retrieval, text filtering and classification, and information extraction technologies are rapidly becoming key components of modem information processing systems, helping end-users to select, visualize and shape their informational environment. The ability to visualize documents into clusters is very essential. The best data summarization technique could be used to summarize data but a poor representation or visualization of it will be totally misleading. As proposed in many researches, clustering techniques are applied and the results are produced when documents are grouped in clusters. However, in some cases, user may want to know the relationship that exists between clusters. In order to illustrate relationships that exist between clusters, a hierarchical agglomerative clustering technique can be applied to build the dendogram. This dendogram display the relationship between a cluster and its sub-clusters. For this reason, user will be able to view the relationship that exists between clusters. In addition to that, the terms or features that characterize each cluster can also be displayed to assist user in understanding the contents of whole text documents that stored in the database. In this research a Text Analyzer (Visual Text) that automates the categorization of text documents based on a visualization approach using the Hierarchical Agglomerative Clustering technique will be proposed. With Visual Text, users are able to analyze and categorize text documents automatically, visualize the overall structure of their informational environment by visualizing each cluster and its sub-clusters, identify words or terms used to categorize each cluster and its sub-cluster and finally evaluate the quality of the text categorization based on the distance· method. The proposed tool is potentially very useful for analyzing text documents automatically for summarization purposes and thus facilitates decision making process. Universiti Malaysia Sabah 2011 Research Report NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/24836/1/Development%20of%20a%20text%20analyzer%20for%20automatic%20categorization%20of%20texts%20documents%20based%20on%20interactive%20visualization%20approach.pdf Mohd Norhisham Razali and Rayner Alfred and Suraya Alias and Asni Tahir (2011) Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach.
institution Universiti Malaysia Sabah
building UMS Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sabah
content_source UMS Institutional Repository
url_provider http://eprints.ums.edu.my/
language English
topic QA Mathematics
spellingShingle QA Mathematics
Mohd Norhisham Razali
Rayner Alfred
Suraya Alias
Asni Tahir
Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
description The wide availability of huge collections of text documents (news corpora, e-mails, web pages, scientific articles and etc) has fostered the need for efficient text mining tools. Information retrieval, text filtering and classification, and information extraction technologies are rapidly becoming key components of modem information processing systems, helping end-users to select, visualize and shape their informational environment. The ability to visualize documents into clusters is very essential. The best data summarization technique could be used to summarize data but a poor representation or visualization of it will be totally misleading. As proposed in many researches, clustering techniques are applied and the results are produced when documents are grouped in clusters. However, in some cases, user may want to know the relationship that exists between clusters. In order to illustrate relationships that exist between clusters, a hierarchical agglomerative clustering technique can be applied to build the dendogram. This dendogram display the relationship between a cluster and its sub-clusters. For this reason, user will be able to view the relationship that exists between clusters. In addition to that, the terms or features that characterize each cluster can also be displayed to assist user in understanding the contents of whole text documents that stored in the database. In this research a Text Analyzer (Visual Text) that automates the categorization of text documents based on a visualization approach using the Hierarchical Agglomerative Clustering technique will be proposed. With Visual Text, users are able to analyze and categorize text documents automatically, visualize the overall structure of their informational environment by visualizing each cluster and its sub-clusters, identify words or terms used to categorize each cluster and its sub-cluster and finally evaluate the quality of the text categorization based on the distance· method. The proposed tool is potentially very useful for analyzing text documents automatically for summarization purposes and thus facilitates decision making process.
format Research Report
author Mohd Norhisham Razali
Rayner Alfred
Suraya Alias
Asni Tahir
author_facet Mohd Norhisham Razali
Rayner Alfred
Suraya Alias
Asni Tahir
author_sort Mohd Norhisham Razali
title Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
title_short Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
title_full Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
title_fullStr Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
title_full_unstemmed Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
title_sort development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
publisher Universiti Malaysia Sabah
publishDate 2011
url https://eprints.ums.edu.my/id/eprint/24836/1/Development%20of%20a%20text%20analyzer%20for%20automatic%20categorization%20of%20texts%20documents%20based%20on%20interactive%20visualization%20approach.pdf
https://eprints.ums.edu.my/id/eprint/24836/
_version_ 1760230288951083008
score 13.211869