Design consideration of Malay text stemmer using structured approach

Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the desi...

Full description

Saved in:
Bibliographic Details
Main Authors: Kassim, Mohamad Nizam, Mat Jali, Shaiful Hisham, Maarof, Mohd Aizaini, Zainal, Anazida, Abdul Wahab, Amirudin
Format: Conference or Workshop Item
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/92523/
http://dx.doi.org/10.1007/978-981-15-0077-0_43
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the design of an effective text stemmer is crucial for ensuring text stemming process maps morphological variants into correct base forms. This paper investigates the design consideration of an effective text stemmer from the perspective of the Malay language. These design considerations are based on current challenges faced by previous researchers in performing text stemming against Malay texts. By adopting these considerations, an effective text stemmer is expected to address common stemming errors and also, expected to produce promising stemming accuracy.