An improved Levenshtein algorithm for spelling correction word candidate list generation

Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time...

Full description

Saved in:

Bibliographic Details
Main Author:	Abdulkhudhur, Hanan Najm
Format:	Thesis
Language:	en en
Published:	2016
Subjects:	QA273-280 Probabilities. Mathematical statistics
Online Access:	https://etd.uum.edu.my/6564/1/s814922_01.pdf https://etd.uum.edu.my/6564/2/s814922_02.pdf https://etd.uum.edu.my/6564/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1833436689537695744
author	Abdulkhudhur, Hanan Najm
author_facet	Abdulkhudhur, Hanan Najm
author_sort	Abdulkhudhur, Hanan Najm
building	UUM Library
collection	Institutional Repository
content_provider	Universiti Utara Malaysia
content_source	UUM Electronic Theses
continent	Asia
country	Malaysia
description	Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same.
format	Thesis
id	my.uum.etd-6564
institution	Universiti Utara Malaysia
language	en en
publishDate	2016
record_format	eprints
spelling	my.uum.etd-65642021-04-05T01:34:49Z https://etd.uum.edu.my/6564/ An improved Levenshtein algorithm for spelling correction word candidate list generation Abdulkhudhur, Hanan Najm QA273-280 Probabilities. Mathematical statistics Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same. 2016 Thesis NonPeerReviewed text en https://etd.uum.edu.my/6564/1/s814922_01.pdf text en https://etd.uum.edu.my/6564/2/s814922_02.pdf Abdulkhudhur, Hanan Najm (2016) An improved Levenshtein algorithm for spelling correction word candidate list generation. Masters thesis, Universiti Utara Malaysia.
spellingShingle	QA273-280 Probabilities. Mathematical statistics Abdulkhudhur, Hanan Najm An improved Levenshtein algorithm for spelling correction word candidate list generation
title	An improved Levenshtein algorithm for spelling correction word candidate list generation
title_full	An improved Levenshtein algorithm for spelling correction word candidate list generation
title_fullStr	An improved Levenshtein algorithm for spelling correction word candidate list generation
title_full_unstemmed	An improved Levenshtein algorithm for spelling correction word candidate list generation
title_short	An improved Levenshtein algorithm for spelling correction word candidate list generation
title_sort	improved levenshtein algorithm for spelling correction word candidate list generation
topic	QA273-280 Probabilities. Mathematical statistics
url	https://etd.uum.edu.my/6564/1/s814922_01.pdf https://etd.uum.edu.my/6564/2/s814922_02.pdf https://etd.uum.edu.my/6564/
url_provider	http://etd.uum.edu.my/

An improved Levenshtein algorithm for spelling correction word candidate list generation

Similar Items