Design consideration of Malay text stemmer using structured approach
Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the desi...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
2020
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/92523/ http://dx.doi.org/10.1007/978-981-15-0077-0_43 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the design of an effective text stemmer is crucial for ensuring text stemming process maps morphological variants into correct base forms. This paper investigates the design consideration of an effective text stemmer from the perspective of the Malay language. These design considerations are based on current challenges faced by previous researchers in performing text stemming against Malay texts. By adopting these considerations, an effective text stemmer is expected to address common stemming errors and also, expected to produce promising stemming accuracy. |
---|