Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents

Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for I...

Full description

Saved in:
Bibliographic Details
Main Author: KUSUMAAGAMA FUDDOLY, AINI RACHMANIA
Format: Thesis
Language:English
Published: 2014
Subjects:
Online Access:http://utpedia.utp.edu.my/15129/1/Thesis%20Final%20-%20AINI%20RACHMANIA.pdf
http://utpedia.utp.edu.my/15129/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utp-utpedia.15129
record_format eprints
spelling my-utp-utpedia.151292019-06-10T13:34:33Z http://utpedia.utp.edu.my/15129/ Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents KUSUMAAGAMA FUDDOLY, AINI RACHMANIA T Technology (General) Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for Indonesian news document. Apart from that, only few were aimed at categorizing and identifying topics. The aim of this study is to solve the problems in text classification for online news: the large volume of data, sparsely distributed articles, classification of unseen data, and limitation of text classification approach for Indonesian news documents. Classification is done using likelihood calculation for the category classification, whereas for the topic identification cosine similarity calculation is employed. Two sets of data have been used during experiments: training and testing corpus. The training corpus consists of 900 documents, and is employed as the learning material for the classifier. The testing set covers 455 documents and are utilised to measure the accuracy of the classifier. Classification was conducted offline and online using Indonesian online news dataset from the year 2011 – 2012. The enhanced method is proven able to produce a good result with accuracy rate of up to 93.84% accuracy for category classification, and 95.64% for topic identification. In terms of computational time, the results prove that proposed classifier works optimally on n = 20, with an average of 2.81 seconds computational time. In comparison against human evaluation, the integrated method has managed to outperform by 13%. A study in depth has also been conducted to investigate the human annotators‘ responses towards the experiments process. This highlights that the enhanced method has advantage over manual classification, and is suitable for Indonesian news classification. 2014-08 Thesis NonPeerReviewed application/pdf en http://utpedia.utp.edu.my/15129/1/Thesis%20Final%20-%20AINI%20RACHMANIA.pdf KUSUMAAGAMA FUDDOLY, AINI RACHMANIA (2014) Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents. Masters thesis, Universiti Teknologi Petronas.
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Electronic and Digitized Intellectual Asset
url_provider http://utpedia.utp.edu.my/
language English
topic T Technology (General)
spellingShingle T Technology (General)
KUSUMAAGAMA FUDDOLY, AINI RACHMANIA
Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents
description Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for Indonesian news document. Apart from that, only few were aimed at categorizing and identifying topics. The aim of this study is to solve the problems in text classification for online news: the large volume of data, sparsely distributed articles, classification of unseen data, and limitation of text classification approach for Indonesian news documents. Classification is done using likelihood calculation for the category classification, whereas for the topic identification cosine similarity calculation is employed. Two sets of data have been used during experiments: training and testing corpus. The training corpus consists of 900 documents, and is employed as the learning material for the classifier. The testing set covers 455 documents and are utilised to measure the accuracy of the classifier. Classification was conducted offline and online using Indonesian online news dataset from the year 2011 – 2012. The enhanced method is proven able to produce a good result with accuracy rate of up to 93.84% accuracy for category classification, and 95.64% for topic identification. In terms of computational time, the results prove that proposed classifier works optimally on n = 20, with an average of 2.81 seconds computational time. In comparison against human evaluation, the integrated method has managed to outperform by 13%. A study in depth has also been conducted to investigate the human annotators‘ responses towards the experiments process. This highlights that the enhanced method has advantage over manual classification, and is suitable for Indonesian news classification.
format Thesis
author KUSUMAAGAMA FUDDOLY, AINI RACHMANIA
author_facet KUSUMAAGAMA FUDDOLY, AINI RACHMANIA
author_sort KUSUMAAGAMA FUDDOLY, AINI RACHMANIA
title Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents
title_short Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents
title_full Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents
title_fullStr Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents
title_full_unstemmed Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents
title_sort indonesian enhanced bracewell‘s text classification method for indonesian news documents
publishDate 2014
url http://utpedia.utp.edu.my/15129/1/Thesis%20Final%20-%20AINI%20RACHMANIA.pdf
http://utpedia.utp.edu.my/15129/
_version_ 1739832091407810560
score 13.211869