Stemming Hausa text: using affix-stripping rules and reference look-up

Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Bimba, A.T., Idris, N., Khamis, N., Mohd Noor, N.F.
Format: Article
Published: Springer Verlag (Germany) 2016
Subjects:
Online Access:http://eprints.um.edu.my/18607/
https://doi.org/10.1007/s10579-015-9311-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1831449851962851328
author Bimba, A.T.
Idris, N.
Khamis, N.
Mohd Noor, N.F.
author_facet Bimba, A.T.
Idris, N.
Khamis, N.
Mohd Noor, N.F.
author_sort Bimba, A.T.
building UM Library
collection Institutional Repository
content_provider Universiti Malaya
content_source UM Research Repository
continent Asia
country Malaysia
description Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified.
format Article
id my.um.eprints-18607
institution Universiti Malaya
publishDate 2016
publisher Springer Verlag (Germany)
record_format eprints
spelling my.um.eprints-186072018-04-25T07:05:50Z http://eprints.um.edu.my/18607/ Stemming Hausa text: using affix-stripping rules and reference look-up Bimba, A.T. Idris, N. Khamis, N. Mohd Noor, N.F. QA75 Electronic computers. Computer science Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified. Springer Verlag (Germany) 2016-09 Article PeerReviewed Bimba, A.T. and Idris, N. and Khamis, N. and Mohd Noor, N.F. (2016) Stemming Hausa text: using affix-stripping rules and reference look-up. Language Resources and Evaluation, 50 (3). pp. 687-703. ISSN 1574-020X, DOI https://doi.org/10.1007/s10579-015-9311-x <https://doi.org/10.1007/s10579-015-9311-x>. https://doi.org/10.1007/s10579-015-9311-x doi:10.1007/s10579-015-9311-x
spellingShingle QA75 Electronic computers. Computer science
Bimba, A.T.
Idris, N.
Khamis, N.
Mohd Noor, N.F.
Stemming Hausa text: using affix-stripping rules and reference look-up
title Stemming Hausa text: using affix-stripping rules and reference look-up
title_full Stemming Hausa text: using affix-stripping rules and reference look-up
title_fullStr Stemming Hausa text: using affix-stripping rules and reference look-up
title_full_unstemmed Stemming Hausa text: using affix-stripping rules and reference look-up
title_short Stemming Hausa text: using affix-stripping rules and reference look-up
title_sort stemming hausa text: using affix-stripping rules and reference look-up
topic QA75 Electronic computers. Computer science
url http://eprints.um.edu.my/18607/
https://doi.org/10.1007/s10579-015-9311-x
url_provider http://eprints.um.edu.my/