Text-based tagging of Malay Hansard document / Mohd Razif Abd Jalil

In natural language processing, part-of-speech tagging plays a vital role. It is a significant condition for putting a human language on the computer science track. Before developing a part-of-speech tagger, a tag set is required for that language. This project is about the rule based part-of-speech...

Full description

Saved in:
Bibliographic Details
Main Author: Abd Jalil, Mohd Razif
Format: Thesis
Language:en
Published: 2012
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/98198/1/98198.pdf
https://ir.uitm.edu.my/id/eprint/98198/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In natural language processing, part-of-speech tagging plays a vital role. It is a significant condition for putting a human language on the computer science track. Before developing a part-of-speech tagger, a tag set is required for that language. This project is about the rule based part-of-speech tagging system for Malay language in Malay hansard document and a tag set that helps in the development of a Parser for the said language. The tagged word will compare with a text with manually tagging each word. The context free grammar will attach with the word that have more than one possible word class to perform a better result of tagging. A very simple architecture is applied that gives reasonably good accuracy. The result shows that 1.37 percent of hansard dictionary with highest frequency helps to tagging more than 55 percent words in hansard document.