An improved Levenshtein algorithm for spelling correction word candidate list generation
Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time...
Saved in:
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | en en |
| Published: |
2016
|
| Subjects: | |
| Online Access: | https://etd.uum.edu.my/6564/1/s814922_01.pdf https://etd.uum.edu.my/6564/2/s814922_02.pdf https://etd.uum.edu.my/6564/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1833436689537695744 |
|---|---|
| author | Abdulkhudhur, Hanan Najm |
| author_facet | Abdulkhudhur, Hanan Najm |
| author_sort | Abdulkhudhur, Hanan Najm |
| building | UUM Library |
| collection | Institutional Repository |
| content_provider | Universiti Utara Malaysia |
| content_source | UUM Electronic Theses |
| continent | Asia |
| country | Malaysia |
| description | Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same. |
| format | Thesis |
| id | my.uum.etd-6564 |
| institution | Universiti Utara Malaysia |
| language | en en |
| publishDate | 2016 |
| record_format | eprints |
| spelling | my.uum.etd-65642021-04-05T01:34:49Z https://etd.uum.edu.my/6564/ An improved Levenshtein algorithm for spelling correction word candidate list generation Abdulkhudhur, Hanan Najm QA273-280 Probabilities. Mathematical statistics Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same. 2016 Thesis NonPeerReviewed text en https://etd.uum.edu.my/6564/1/s814922_01.pdf text en https://etd.uum.edu.my/6564/2/s814922_02.pdf Abdulkhudhur, Hanan Najm (2016) An improved Levenshtein algorithm for spelling correction word candidate list generation. Masters thesis, Universiti Utara Malaysia. |
| spellingShingle | QA273-280 Probabilities. Mathematical statistics Abdulkhudhur, Hanan Najm An improved Levenshtein algorithm for spelling correction word candidate list generation |
| title | An improved Levenshtein algorithm for spelling correction word candidate list generation |
| title_full | An improved Levenshtein algorithm for spelling correction word candidate list generation |
| title_fullStr | An improved Levenshtein algorithm for spelling correction word candidate list generation |
| title_full_unstemmed | An improved Levenshtein algorithm for spelling correction word candidate list generation |
| title_short | An improved Levenshtein algorithm for spelling correction word candidate list generation |
| title_sort | improved levenshtein algorithm for spelling correction word candidate list generation |
| topic | QA273-280 Probabilities. Mathematical statistics |
| url | https://etd.uum.edu.my/6564/1/s814922_01.pdf https://etd.uum.edu.my/6564/2/s814922_02.pdf https://etd.uum.edu.my/6564/ |
| url_provider | http://etd.uum.edu.my/ |
