Topic identification using filtering and rule generation algorithm for textual document

Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. Howev...

Full description

Saved in:

Bibliographic Details
Main Author:	Nurul Syafidah, Jamil
Format:	Thesis
Language:	en en
Published:	2015
Subjects:	QA75 Electronic computers. Computer science
Online Access:	https://etd.uum.edu.my/5379/1/s812431.pdf https://etd.uum.edu.my/5379/2/s812431_abstract.pdf https://etd.uum.edu.my/5379/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1833436509465739264
author	Nurul Syafidah, Jamil
author_facet	Nurul Syafidah, Jamil
author_sort	Nurul Syafidah, Jamil
building	UUM Library
collection	Institutional Repository
content_provider	Universiti Utara Malaysia
content_source	UUM Electronic Theses
continent	Asia
country	Malaysia
description	Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. However, not all frequent occurrence terms are relevant. The term extraction phase in topic identification method has resulted in extracted terms that might have similar meaning which is known as synonymy problem. Filtering and rule generation algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to identify topic for each verse based on the extracted terms. The PFA will process and filter each sentence based on nouns and predefined keywords to produce suitable terms for the topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse.
format	Thesis
id	my.uum.etd-5379
institution	Universiti Utara Malaysia
language	en en
publishDate	2015
record_format	eprints
spelling	my.uum.etd-53792021-04-04T08:54:11Z https://etd.uum.edu.my/5379/ Topic identification using filtering and rule generation algorithm for textual document Nurul Syafidah, Jamil QA75 Electronic computers. Computer science Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. However, not all frequent occurrence terms are relevant. The term extraction phase in topic identification method has resulted in extracted terms that might have similar meaning which is known as synonymy problem. Filtering and rule generation algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to identify topic for each verse based on the extracted terms. The PFA will process and filter each sentence based on nouns and predefined keywords to produce suitable terms for the topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse. 2015 Thesis NonPeerReviewed text en https://etd.uum.edu.my/5379/1/s812431.pdf text en https://etd.uum.edu.my/5379/2/s812431_abstract.pdf Nurul Syafidah, Jamil (2015) Topic identification using filtering and rule generation algorithm for textual document. Masters thesis, Universiti Utara Malaysia.
spellingShingle	QA75 Electronic computers. Computer science Nurul Syafidah, Jamil Topic identification using filtering and rule generation algorithm for textual document
title	Topic identification using filtering and rule generation algorithm for textual document
title_full	Topic identification using filtering and rule generation algorithm for textual document
title_fullStr	Topic identification using filtering and rule generation algorithm for textual document
title_full_unstemmed	Topic identification using filtering and rule generation algorithm for textual document
title_short	Topic identification using filtering and rule generation algorithm for textual document
title_sort	topic identification using filtering and rule generation algorithm for textual document
topic	QA75 Electronic computers. Computer science
url	https://etd.uum.edu.my/5379/1/s812431.pdf https://etd.uum.edu.my/5379/2/s812431_abstract.pdf https://etd.uum.edu.my/5379/
url_provider	http://etd.uum.edu.my/

Topic identification using filtering and rule generation algorithm for textual document

Similar Items