Unsupervised classification of multi-class chart images: A comparison of customized CNNs and transfer learning techniques

Visualization through charts has become a crucial method for presenting complex data in a clear and structured format, surpassing traditional representation techniques in both clarity and interpretability. However, the automatic classification of chart images remains a significant challenge, particu...

Full description

Saved in:
Bibliographic Details
Main Authors: Hassan Zaidi, Syed Muhammad, Jamil Alsayaydeh, Jamil Abedalrahim, Khan, Abdul Hafeez, Khan, Abdullah Ayub, AlZubi, Ahmad Ali, Ogunshola, Benny, Herawan, Safarudin Gazali
Format: Article
Language:en
Published: PeerJ Inc. 2025
Online Access:http://eprints.utem.edu.my/id/eprint/29573/2/0248713102025144826.pdf
http://eprints.utem.edu.my/id/eprint/29573/
https://peerj.com/articles/cs-3148/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visualization through charts has become a crucial method for presenting complex data in a clear and structured format, surpassing traditional representation techniques in both clarity and interpretability. However, the automatic classification of chart images remains a significant challenge, particularly in the absence of labeled data. This study investigates the unsupervised classification of chart images using a combination of deep learning and clustering techniques. The primary objective is to develop a method capable of automatically categorizing four common chart types, histogram, bar, line, and pie charts without relying on annotated datasets. These chart types are frequently used in critical domains such as financial, socio-economic, and political analysis. To achieve this, a pre-trained Visual Geometry Group 16 (VGG16) model is employed for feature extraction, followed by principal component analysis (PCA) for dimensionality reduction. The k-means clustering algorithm is then applied to group visually similar chart images. For classification, three pre-trained models, Residual Network 50 (ResNet50), Residual Network 50 Version 2 (ResNet50V2), and Densely Connected Convolutional Network (DenseNet) are evaluated alongside a customized convolutional neural network (CNN) using the ChartVQA dataset. The models achieve classification accuracy of 85.40%, 80.76%, 82.46%, and 90.71%, respectively. By integrating transfer learning with clustering and CNN-based architecture, this study presents a robust and scalable framework for unsupervised chart image classification.