A connected component-based deep learning model for multi-type struck-out component classification

Due to the presence of struck-out handwritten words in document images, the performance of different methods degrades for several important applications, such as handwriting recognition, writer, gender, fraudulent document identification, document age estimation, writer age estimation, normal/abnorm...

Full description

Saved in:
Bibliographic Details
Main Authors: Shivakumara, Palaiahnakote, Jain, Tanmay, Surana, Nitish, Pal, Umapada, Lu, Tong, Blumenstein, Michael, Chanda, Sukalpa
Format: Conference or Workshop Item
Published: 2021
Subjects:
Online Access:http://eprints.um.edu.my/35417/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.eprints.35417
record_format eprints
spelling my.um.eprints.354172023-10-17T07:20:18Z http://eprints.um.edu.my/35417/ A connected component-based deep learning model for multi-type struck-out component classification Shivakumara, Palaiahnakote Jain, Tanmay Surana, Nitish Pal, Umapada Lu, Tong Blumenstein, Michael Chanda, Sukalpa QA75 Electronic computers. Computer science T Technology (General) Due to the presence of struck-out handwritten words in document images, the performance of different methods degrades for several important applications, such as handwriting recognition, writer, gender, fraudulent document identification, document age estimation, writer age estimation, normal/abnormal behavior of person analysis, and descriptive answer evaluation. This work proposes a new method which combines connected component analysis for text component detection and deep learning for classification of struck-out and non-struck-out words. For text component detection, the proposed method finds the stroke width to detect edges of texts in images, and then performs smoothing operations to remove noise. Furthermore, morphological operations are performed on smoothed images to label connected components as text by fixing bounding boxes. Inspired by the great success of deep learning models, we explore DenseNet for classifying struck-out and non-struck-out handwritten components by considering text components as input. Experimental results on our dataset demonstrate the proposed method outperforms the existing methods in terms of classification rate. 2021 Conference or Workshop Item PeerReviewed Shivakumara, Palaiahnakote and Jain, Tanmay and Surana, Nitish and Pal, Umapada and Lu, Tong and Blumenstein, Michael and Chanda, Sukalpa (2021) A connected component-based deep learning model for multi-type struck-out component classification. In: International Workshops co-located with the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, 5 - 10 September 2021, Lausanne.
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
topic QA75 Electronic computers. Computer science
T Technology (General)
spellingShingle QA75 Electronic computers. Computer science
T Technology (General)
Shivakumara, Palaiahnakote
Jain, Tanmay
Surana, Nitish
Pal, Umapada
Lu, Tong
Blumenstein, Michael
Chanda, Sukalpa
A connected component-based deep learning model for multi-type struck-out component classification
description Due to the presence of struck-out handwritten words in document images, the performance of different methods degrades for several important applications, such as handwriting recognition, writer, gender, fraudulent document identification, document age estimation, writer age estimation, normal/abnormal behavior of person analysis, and descriptive answer evaluation. This work proposes a new method which combines connected component analysis for text component detection and deep learning for classification of struck-out and non-struck-out words. For text component detection, the proposed method finds the stroke width to detect edges of texts in images, and then performs smoothing operations to remove noise. Furthermore, morphological operations are performed on smoothed images to label connected components as text by fixing bounding boxes. Inspired by the great success of deep learning models, we explore DenseNet for classifying struck-out and non-struck-out handwritten components by considering text components as input. Experimental results on our dataset demonstrate the proposed method outperforms the existing methods in terms of classification rate.
format Conference or Workshop Item
author Shivakumara, Palaiahnakote
Jain, Tanmay
Surana, Nitish
Pal, Umapada
Lu, Tong
Blumenstein, Michael
Chanda, Sukalpa
author_facet Shivakumara, Palaiahnakote
Jain, Tanmay
Surana, Nitish
Pal, Umapada
Lu, Tong
Blumenstein, Michael
Chanda, Sukalpa
author_sort Shivakumara, Palaiahnakote
title A connected component-based deep learning model for multi-type struck-out component classification
title_short A connected component-based deep learning model for multi-type struck-out component classification
title_full A connected component-based deep learning model for multi-type struck-out component classification
title_fullStr A connected component-based deep learning model for multi-type struck-out component classification
title_full_unstemmed A connected component-based deep learning model for multi-type struck-out component classification
title_sort connected component-based deep learning model for multi-type struck-out component classification
publishDate 2021
url http://eprints.um.edu.my/35417/
_version_ 1781704465889361920
score 13.211869