Stemming and lemmatization: A comparison of retrieval performances
The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and retu...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://eprints.um.edu.my/13423/1/rp030_I3007.pdf http://eprints.um.edu.my/13423/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.eprints.13423 |
---|---|
record_format |
eprints |
spelling |
my.um.eprints.134232020-01-08T07:55:13Z http://eprints.um.edu.my/13423/ Stemming and lemmatization: A comparison of retrieval performances Balakrishnan, Vimala Lloyd-Yemoh, Ethel T Technology (General) The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and returns the base or dictionary form of a word. Comparisons were also made between these two techniques with a baseline ranking algorithm (i.e. with no language processing). A search engine was developed and the algorithms were tested based on a test collection. Both mean average precisions and histograms indicate stemming and lemmatization to outperform the baseline algorithm. As for the language modeling techniques, lemmatization produced better precision compared to stemming, however the differences are insignificant. Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result. 2014-04 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.um.edu.my/13423/1/rp030_I3007.pdf Balakrishnan, Vimala and Lloyd-Yemoh, Ethel (2014) Stemming and lemmatization: A comparison of retrieval performances. In: Proceedings of SCEI Seoul Conferences, 10-11 Apr 2014, Seoul, Korea. |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Research Repository |
url_provider |
http://eprints.um.edu.my/ |
language |
English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Balakrishnan, Vimala Lloyd-Yemoh, Ethel Stemming and lemmatization: A comparison of retrieval performances |
description |
The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization.
Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and returns the base or dictionary form of a word. Comparisons were also made between these two techniques with a baseline ranking algorithm (i.e. with no language processing). A search engine was developed and the algorithms were tested based on a test collection. Both mean average precisions and histograms indicate stemming and
lemmatization to outperform the baseline algorithm. As for the language modeling techniques, lemmatization produced better precision compared to stemming, however the differences are insignificant. Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result. |
format |
Conference or Workshop Item |
author |
Balakrishnan, Vimala Lloyd-Yemoh, Ethel |
author_facet |
Balakrishnan, Vimala Lloyd-Yemoh, Ethel |
author_sort |
Balakrishnan, Vimala |
title |
Stemming and lemmatization: A comparison of retrieval
performances |
title_short |
Stemming and lemmatization: A comparison of retrieval
performances |
title_full |
Stemming and lemmatization: A comparison of retrieval
performances |
title_fullStr |
Stemming and lemmatization: A comparison of retrieval
performances |
title_full_unstemmed |
Stemming and lemmatization: A comparison of retrieval
performances |
title_sort |
stemming and lemmatization: a comparison of retrieval
performances |
publishDate |
2014 |
url |
http://eprints.um.edu.my/13423/1/rp030_I3007.pdf http://eprints.um.edu.my/13423/ |
_version_ |
1657488177720459264 |
score |
13.211869 |