Retrieving Malay hansard documents using topic discovery / Nurul Ain Mohd Fadzil Thani

Due to difficulty bring by the overloaded of digitized collection, Information Retrieval rapidly concerns in improving task such as discovering relevant documents. The thesis is performed to improve the issues produced by the lack of keyword-based search for document in indexing and queries, and the...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Fadzil Thani, Nurul Ain
Format: Thesis
Language:en
Published: 2012
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/98294/1/98294.PDF
https://ir.uitm.edu.my/id/eprint/98294/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to difficulty bring by the overloaded of digitized collection, Information Retrieval rapidly concerns in improving task such as discovering relevant documents. The thesis is performed to improve the issues produced by the lack of keyword-based search for document in indexing and queries, and the shortage of sources on topic discovery for Malay language research. Thus, this thesis uses a topic discovery algorithm, which is Latent Dirichlet Allocation, in indexing to construct a conceptual-based search and selects Malay Hansard document as a data-set that represent Malay language document. The objectives of this thesis are to identify highest frequency words on Malay Hansard document using Word Frequency method, to index the data-set based on word suggested by Latent Dirichlet Allocation method, and to develop a retrieval prototype for this document using conceptual-based search. In this research, the result of highest frequency word from Word Frequency method is indexed as the keyword and acts as a baseline that represents the keyword-based search. While, the result of word suggested by Latent Dirichlet Allocation is indexed as a group of related keywords and it represents the conceptual-based search. As the result, from the indexing of conceptual-based, the retrieval prototype system is able to identify keyword that also related to search query word.