Stemming and lemmatization: A comparison of retrieval performances

The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and retu...

Full description

Saved in:
Bibliographic Details
Main Authors: Balakrishnan, Vimala, Lloyd-Yemoh, Ethel
Format: Conference or Workshop Item
Language:English
Published: 2014
Subjects:
Online Access:http://eprints.um.edu.my/13423/1/rp030_I3007.pdf
http://eprints.um.edu.my/13423/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.eprints.13423
record_format eprints
spelling my.um.eprints.134232020-01-08T07:55:13Z http://eprints.um.edu.my/13423/ Stemming and lemmatization: A comparison of retrieval performances Balakrishnan, Vimala Lloyd-Yemoh, Ethel T Technology (General) The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and returns the base or dictionary form of a word. Comparisons were also made between these two techniques with a baseline ranking algorithm (i.e. with no language processing). A search engine was developed and the algorithms were tested based on a test collection. Both mean average precisions and histograms indicate stemming and lemmatization to outperform the baseline algorithm. As for the language modeling techniques, lemmatization produced better precision compared to stemming, however the differences are insignificant. Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result. 2014-04 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.um.edu.my/13423/1/rp030_I3007.pdf Balakrishnan, Vimala and Lloyd-Yemoh, Ethel (2014) Stemming and lemmatization: A comparison of retrieval performances. In: Proceedings of SCEI Seoul Conferences, 10-11 Apr 2014, Seoul, Korea.
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
language English
topic T Technology (General)
spellingShingle T Technology (General)
Balakrishnan, Vimala
Lloyd-Yemoh, Ethel
Stemming and lemmatization: A comparison of retrieval performances
description The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and returns the base or dictionary form of a word. Comparisons were also made between these two techniques with a baseline ranking algorithm (i.e. with no language processing). A search engine was developed and the algorithms were tested based on a test collection. Both mean average precisions and histograms indicate stemming and lemmatization to outperform the baseline algorithm. As for the language modeling techniques, lemmatization produced better precision compared to stemming, however the differences are insignificant. Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result.
format Conference or Workshop Item
author Balakrishnan, Vimala
Lloyd-Yemoh, Ethel
author_facet Balakrishnan, Vimala
Lloyd-Yemoh, Ethel
author_sort Balakrishnan, Vimala
title Stemming and lemmatization: A comparison of retrieval performances
title_short Stemming and lemmatization: A comparison of retrieval performances
title_full Stemming and lemmatization: A comparison of retrieval performances
title_fullStr Stemming and lemmatization: A comparison of retrieval performances
title_full_unstemmed Stemming and lemmatization: A comparison of retrieval performances
title_sort stemming and lemmatization: a comparison of retrieval performances
publishDate 2014
url http://eprints.um.edu.my/13423/1/rp030_I3007.pdf
http://eprints.um.edu.my/13423/
_version_ 1657488177720459264
score 13.211869