Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents
Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for I...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://utpedia.utp.edu.my/15129/1/Thesis%20Final%20-%20AINI%20RACHMANIA.pdf http://utpedia.utp.edu.my/15129/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utp-utpedia.15129 |
---|---|
record_format |
eprints |
spelling |
my-utp-utpedia.151292019-06-10T13:34:33Z http://utpedia.utp.edu.my/15129/ Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents KUSUMAAGAMA FUDDOLY, AINI RACHMANIA T Technology (General) Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for Indonesian news document. Apart from that, only few were aimed at categorizing and identifying topics. The aim of this study is to solve the problems in text classification for online news: the large volume of data, sparsely distributed articles, classification of unseen data, and limitation of text classification approach for Indonesian news documents. Classification is done using likelihood calculation for the category classification, whereas for the topic identification cosine similarity calculation is employed. Two sets of data have been used during experiments: training and testing corpus. The training corpus consists of 900 documents, and is employed as the learning material for the classifier. The testing set covers 455 documents and are utilised to measure the accuracy of the classifier. Classification was conducted offline and online using Indonesian online news dataset from the year 2011 – 2012. The enhanced method is proven able to produce a good result with accuracy rate of up to 93.84% accuracy for category classification, and 95.64% for topic identification. In terms of computational time, the results prove that proposed classifier works optimally on n = 20, with an average of 2.81 seconds computational time. In comparison against human evaluation, the integrated method has managed to outperform by 13%. A study in depth has also been conducted to investigate the human annotators‘ responses towards the experiments process. This highlights that the enhanced method has advantage over manual classification, and is suitable for Indonesian news classification. 2014-08 Thesis NonPeerReviewed application/pdf en http://utpedia.utp.edu.my/15129/1/Thesis%20Final%20-%20AINI%20RACHMANIA.pdf KUSUMAAGAMA FUDDOLY, AINI RACHMANIA (2014) Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents. Masters thesis, Universiti Teknologi Petronas. |
institution |
Universiti Teknologi Petronas |
building |
UTP Resource Centre |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Petronas |
content_source |
UTP Electronic and Digitized Intellectual Asset |
url_provider |
http://utpedia.utp.edu.my/ |
language |
English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) KUSUMAAGAMA FUDDOLY, AINI RACHMANIA Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents |
description |
Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for Indonesian news document. Apart from that, only few were aimed at categorizing and identifying topics. The aim of this study is to solve the problems in text classification for online news: the large volume of data, sparsely distributed articles, classification of unseen data, and limitation of text classification approach for Indonesian news documents. Classification is done using likelihood calculation for the category classification, whereas for the topic identification cosine similarity calculation is employed. Two sets of data have been used during experiments: training and testing corpus. The training corpus consists of 900 documents, and is employed as the learning material for the classifier. The testing set covers 455 documents and are utilised to measure the accuracy of the classifier. Classification was conducted offline and online using Indonesian online news dataset from the year 2011 – 2012. The enhanced method is proven able to produce a good result with accuracy rate of up to 93.84% accuracy for category classification, and 95.64% for topic identification. In terms of computational time, the results prove that proposed classifier works optimally on n = 20, with an average of 2.81 seconds computational time. In comparison against human evaluation, the integrated method has managed to outperform by 13%. A study in depth has also been conducted to investigate the human annotators‘ responses towards the experiments process. This highlights that the enhanced method has advantage over manual classification, and is suitable for Indonesian news classification. |
format |
Thesis |
author |
KUSUMAAGAMA FUDDOLY, AINI RACHMANIA |
author_facet |
KUSUMAAGAMA FUDDOLY, AINI RACHMANIA |
author_sort |
KUSUMAAGAMA FUDDOLY, AINI RACHMANIA |
title |
Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents |
title_short |
Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents |
title_full |
Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents |
title_fullStr |
Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents |
title_full_unstemmed |
Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents |
title_sort |
indonesian enhanced bracewell‘s text classification method for indonesian news documents |
publishDate |
2014 |
url |
http://utpedia.utp.edu.my/15129/1/Thesis%20Final%20-%20AINI%20RACHMANIA.pdf http://utpedia.utp.edu.my/15129/ |
_version_ |
1739832091407810560 |
score |
13.211869 |