Hybrid model of post-processing techniques for Arabic optical character recognition

Optical character recognition (OCR) is used to extract text contained in an image. One of the stages in OCR is the post-processing and it corrects the errors of OCR output text. The OCR multiple outputs approach consists of three processes: differentiation, alignment, and voting. Existing differenti...

Full description

Saved in:

Bibliographic Details
Main Author:	Habeeb, Imad Qasim
Format:	Thesis
Language:	en en
Published:	2016
Subjects:	T58.5-58.64 Information technology QA75 Electronic computers. Computer science
Online Access:	https://etd.uum.edu.my/6030/1/s94758_01.pdf https://etd.uum.edu.my/6030/2/s94758_02.pdf https://etd.uum.edu.my/6030/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1833436604189900800
author	Habeeb, Imad Qasim
author_facet	Habeeb, Imad Qasim
author_sort	Habeeb, Imad Qasim
building	UUM Library
collection	Institutional Repository
content_provider	Universiti Utara Malaysia
content_source	UUM Electronic Theses
continent	Asia
country	Malaysia
description	Optical character recognition (OCR) is used to extract text contained in an image. One of the stages in OCR is the post-processing and it corrects the errors of OCR output text. The OCR multiple outputs approach consists of three processes: differentiation, alignment, and voting. Existing differentiation techniques suffer from the loss of important features as it uses N-versions of input images. On the other hand, alignment techniques in the literatures are based on approximation while the voting process is not context-aware. These drawbacks lead to a high error rate in OCR. This research proposed three improved techniques of differentiation, alignment, and voting to overcome the identified drawbacks. These techniques were later combined into a hybrid model that can recognize the optical characters in the Arabic language. Each of the proposed technique was separately evaluated against three other relevant existing techniques. The performance measurements used in this study were Word Error Rate (WER), Character Error Rate (CER), and Non-word Error Rate (NWER). Experimental results showed a relative decrease in error rate on all measurements for the evaluated techniques. Similarly, the hybrid model also obtained lower WER, CER, and NWER by 30.35%, 52.42%, and 47.86% respectively when compared to the three relevant existing models. This study contributes to the OCR domain as the proposed hybrid model of post-processing techniques could facilitate the automatic recognition of Arabic text. Hence, it will lead to a better information retrieval.
format	Thesis
id	my.uum.etd-6030
institution	Universiti Utara Malaysia
language	en en
publishDate	2016
record_format	eprints
spelling	my.uum.etd-60302021-04-05T02:28:59Z https://etd.uum.edu.my/6030/ Hybrid model of post-processing techniques for Arabic optical character recognition Habeeb, Imad Qasim T58.5-58.64 Information technology QA75 Electronic computers. Computer science Optical character recognition (OCR) is used to extract text contained in an image. One of the stages in OCR is the post-processing and it corrects the errors of OCR output text. The OCR multiple outputs approach consists of three processes: differentiation, alignment, and voting. Existing differentiation techniques suffer from the loss of important features as it uses N-versions of input images. On the other hand, alignment techniques in the literatures are based on approximation while the voting process is not context-aware. These drawbacks lead to a high error rate in OCR. This research proposed three improved techniques of differentiation, alignment, and voting to overcome the identified drawbacks. These techniques were later combined into a hybrid model that can recognize the optical characters in the Arabic language. Each of the proposed technique was separately evaluated against three other relevant existing techniques. The performance measurements used in this study were Word Error Rate (WER), Character Error Rate (CER), and Non-word Error Rate (NWER). Experimental results showed a relative decrease in error rate on all measurements for the evaluated techniques. Similarly, the hybrid model also obtained lower WER, CER, and NWER by 30.35%, 52.42%, and 47.86% respectively when compared to the three relevant existing models. This study contributes to the OCR domain as the proposed hybrid model of post-processing techniques could facilitate the automatic recognition of Arabic text. Hence, it will lead to a better information retrieval. 2016 Thesis NonPeerReviewed text en https://etd.uum.edu.my/6030/1/s94758_01.pdf text en https://etd.uum.edu.my/6030/2/s94758_02.pdf Habeeb, Imad Qasim (2016) Hybrid model of post-processing techniques for Arabic optical character recognition. PhD. thesis, Universiti Utara Malaysia.
spellingShingle	T58.5-58.64 Information technology QA75 Electronic computers. Computer science Habeeb, Imad Qasim Hybrid model of post-processing techniques for Arabic optical character recognition
title	Hybrid model of post-processing techniques for Arabic optical character recognition
title_full	Hybrid model of post-processing techniques for Arabic optical character recognition
title_fullStr	Hybrid model of post-processing techniques for Arabic optical character recognition
title_full_unstemmed	Hybrid model of post-processing techniques for Arabic optical character recognition
title_short	Hybrid model of post-processing techniques for Arabic optical character recognition
title_sort	hybrid model of post-processing techniques for arabic optical character recognition
topic	T58.5-58.64 Information technology QA75 Electronic computers. Computer science
url	https://etd.uum.edu.my/6030/1/s94758_01.pdf https://etd.uum.edu.my/6030/2/s94758_02.pdf https://etd.uum.edu.my/6030/
url_provider	http://etd.uum.edu.my/

Hybrid model of post-processing techniques for Arabic optical character recognition

Similar Items