System for retrieving similar sentences from Malay translation of the Al-Quran / Nur Farhana Rasip

Similar sentences can be computed by using Bigram Language Model. There are others researchers computed similar sentences but not using Malay documents. The problems are there are lack of study related to Malay documents for computing similar sentences. It will also produce huge amount of relevant d...

Full description

Saved in:
Bibliographic Details
Main Author: Rasip, Nur Farhana
Format: Thesis
Language:en
Published: 2019
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/109972/1/109972.pdf
https://ir.uitm.edu.my/id/eprint/109972/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Similar sentences can be computed by using Bigram Language Model. There are others researchers computed similar sentences but not using Malay documents. The problems are there are lack of study related to Malay documents for computing similar sentences. It will also produce huge amount of relevant documents by using frequent words and topic words. The similar sentences also were scattered in the Al-Quran. So, the user will have a hard time to search what they want manually. There are two process that need to be conducted to retrieve similar sentences. The first process was pre-processing the data and used Bigram Language Model to retrieve similar sentences. Bigram Language Model will counts all 2-word-long subsequence or bigram that appear on data and build probability distribution of bigram. As the result, the prototype will display to user the similar sentences that match with user’s query. The precision and recall was high for every query. The benefits of this system was instead of a user manually searched for what they want in the Al-Quran by flipping through the translated documents, this system will help user to quickly enters a query word and retrieve all the match documents that are related to what they are looking for.