An improved Levenshtein algorithm for spelling correction word candidate list generation

Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time...

Full description

Saved in:
Bibliographic Details
Main Author: Abdulkhudhur, Hanan Najm
Format: Thesis
Language:en
en
Published: 2016
Subjects:
Online Access:https://etd.uum.edu.my/6564/1/s814922_01.pdf
https://etd.uum.edu.my/6564/2/s814922_02.pdf
https://etd.uum.edu.my/6564/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1833436689537695744
author Abdulkhudhur, Hanan Najm
author_facet Abdulkhudhur, Hanan Najm
author_sort Abdulkhudhur, Hanan Najm
building UUM Library
collection Institutional Repository
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
continent Asia
country Malaysia
description Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same.
format Thesis
id my.uum.etd-6564
institution Universiti Utara Malaysia
language en
en
publishDate 2016
record_format eprints
spelling my.uum.etd-65642021-04-05T01:34:49Z https://etd.uum.edu.my/6564/ An improved Levenshtein algorithm for spelling correction word candidate list generation Abdulkhudhur, Hanan Najm QA273-280 Probabilities. Mathematical statistics Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same. 2016 Thesis NonPeerReviewed text en https://etd.uum.edu.my/6564/1/s814922_01.pdf text en https://etd.uum.edu.my/6564/2/s814922_02.pdf Abdulkhudhur, Hanan Najm (2016) An improved Levenshtein algorithm for spelling correction word candidate list generation. Masters thesis, Universiti Utara Malaysia.
spellingShingle QA273-280 Probabilities. Mathematical statistics
Abdulkhudhur, Hanan Najm
An improved Levenshtein algorithm for spelling correction word candidate list generation
title An improved Levenshtein algorithm for spelling correction word candidate list generation
title_full An improved Levenshtein algorithm for spelling correction word candidate list generation
title_fullStr An improved Levenshtein algorithm for spelling correction word candidate list generation
title_full_unstemmed An improved Levenshtein algorithm for spelling correction word candidate list generation
title_short An improved Levenshtein algorithm for spelling correction word candidate list generation
title_sort improved levenshtein algorithm for spelling correction word candidate list generation
topic QA273-280 Probabilities. Mathematical statistics
url https://etd.uum.edu.my/6564/1/s814922_01.pdf
https://etd.uum.edu.my/6564/2/s814922_02.pdf
https://etd.uum.edu.my/6564/
url_provider http://etd.uum.edu.my/