Quranic diacritic and character segmentation and recognition using flood fill and k-nearest neighbors algorithm
The detection, recognition and conversion of the characters in an image into a text are called optical character recognition (OCR). A distinctive type of OCR is used to process Arabic characters, namely, Arabic Optical Character Recognition (AOCR). OCR is increasingly used in many applications, w...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/90723/1/FSKTM%202019%2059%20IR.pdf http://psasir.upm.edu.my/id/eprint/90723/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The detection, recognition and conversion of the characters in an image into a text
are called optical character recognition (OCR). A distinctive type of OCR is used to
process Arabic characters, namely, Arabic Optical Character Recognition (AOCR).
OCR is increasingly used in many applications, where this process is preferred
to automatically perform a process without human intervention.
The Quranic handwriting text contains two elements, namely, diacritics and
characters. However, the current Arabic handwritten OCR system produces low levels
of accuracy and no research focused on Quran image recognition.
The current AOCR inaccurately recognizes diacritic and characters, and the research
and efforts in the area of AOCR are insufficient. Many studies have been carried out
so far, but for Quran handwriting has not been researched as thoroughly as Arabic,
Latin or Chinese handwritten systems. The current research is focused on solving the
mentioned problems through improving the accuracy of recognition rate of AOCR by
proposing a new segmentation, feature extraction methods and finding a suitable
classification.
In this thesis, a new techniques, methods and algorithms are proposed to check
the similarities and originalities of the Quranic handwriting content. The diacritic
detections are performed using a region-based algorithm with 89% accuracy and 95%
improved by using flood fill segmentations method. 2DMED feature extraction
accuracy was 90% for diacritics and 96% improved by applied CNN. Character
recognition is performed based on the projection method with 86% accuracy, and 92% improved by using flood fill. 2DMED in characters was 88% and 91 % after improved
by applied CNN. For classification, KNN used before and after enhancement
technique based on essential vector with our dataset, the diacritic accuracy was
96.4286% after enhancement, which is better than the 87.5020% in detecting. For
characters was at 92.3077% improvement, which is better that normal
KNN algorithm which exhibited an 86.1429% accuracy in detecting. |
---|