Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection

This study addresses the critical issue of data duplication in healthcare-related internet of things (IoT) datasets, which can compromise the reliability of analyses and patient outcomes. A Python-based visualization framework using Pandas and Matplotlib was developed to detect and represent duplica...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Isa, Siti Noor Basirah, Emran, Nurul Akmar, Harum, Norharyati, Logenthiran, Machap, Nordin, Azlin
Format: Article
Language:en
en
Published: Institute of Advanced Engineering and Science 2025
Subjects:
Online Access:http://irep.iium.edu.my/123798/7/123798_Enhancing%20data%20integrity%20in%20internet%20of%20things.pdf
http://irep.iium.edu.my/123798/8/123798_Enhancing%20data%20integrity%20in%20internet%20of%20things_Scopus.pdf
http://irep.iium.edu.my/123798/
https://beei.org/index.php/EEI/article/view/10063
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1847096660764655616
author Md Isa, Siti Noor Basirah
Emran, Nurul Akmar
Harum, Norharyati
Logenthiran, Machap
Nordin, Azlin
author_facet Md Isa, Siti Noor Basirah
Emran, Nurul Akmar
Harum, Norharyati
Logenthiran, Machap
Nordin, Azlin
author_sort Md Isa, Siti Noor Basirah
building IIUM Library
collection Institutional Repository
content_provider International Islamic University Malaysia
content_source IIUM Repository (IREP)
continent Asia
country Malaysia
description This study addresses the critical issue of data duplication in healthcare-related internet of things (IoT) datasets, which can compromise the reliability of analyses and patient outcomes. A Python-based visualization framework using Pandas and Matplotlib was developed to detect and represent duplicate records. The methodology was applied to six cancer-related datasets sourced from Kaggle, ranging from 300 to 55,000 records, encompassing numerical, textual, and categorical data types. The visualization technique provided clear insights into duplication patterns, identifying specific counts such as 7 duplicates in the wearable device dataset, 19 in the thyroid recurrence dataset, and 534 in the synthetic healthcare electronic health record (EHR) dataset. Compared to traditional detection methods, the visualization tool facilitated faster and more intuitive initial data assessment, demonstrating its effectiveness for rapid quality checks in healthcare datasets. However, scalability limitations were observed in larger datasets, where visual clarity declined. These findings highlight the value of visualization as a preliminary data quality assessment tool and suggest future integration with advanced detection algorithms to enhance robustness and scalability
format Article
id my.iium.irep-123798
institution Universiti Islam Antarabangsa Malaysia
language en
en
publishDate 2025
publisher Institute of Advanced Engineering and Science
record_format dspace
spelling my.iium.irep-1237982025-10-17T08:34:54Z http://irep.iium.edu.my/123798/ Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection Md Isa, Siti Noor Basirah Emran, Nurul Akmar Harum, Norharyati Logenthiran, Machap Nordin, Azlin QA75 Electronic computers. Computer science This study addresses the critical issue of data duplication in healthcare-related internet of things (IoT) datasets, which can compromise the reliability of analyses and patient outcomes. A Python-based visualization framework using Pandas and Matplotlib was developed to detect and represent duplicate records. The methodology was applied to six cancer-related datasets sourced from Kaggle, ranging from 300 to 55,000 records, encompassing numerical, textual, and categorical data types. The visualization technique provided clear insights into duplication patterns, identifying specific counts such as 7 duplicates in the wearable device dataset, 19 in the thyroid recurrence dataset, and 534 in the synthetic healthcare electronic health record (EHR) dataset. Compared to traditional detection methods, the visualization tool facilitated faster and more intuitive initial data assessment, demonstrating its effectiveness for rapid quality checks in healthcare datasets. However, scalability limitations were observed in larger datasets, where visual clarity declined. These findings highlight the value of visualization as a preliminary data quality assessment tool and suggest future integration with advanced detection algorithms to enhance robustness and scalability Institute of Advanced Engineering and Science 2025-10 Article PeerReviewed application/pdf en http://irep.iium.edu.my/123798/7/123798_Enhancing%20data%20integrity%20in%20internet%20of%20things.pdf application/pdf en http://irep.iium.edu.my/123798/8/123798_Enhancing%20data%20integrity%20in%20internet%20of%20things_Scopus.pdf Md Isa, Siti Noor Basirah and Emran, Nurul Akmar and Harum, Norharyati and Logenthiran, Machap and Nordin, Azlin (2025) Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection. Bulletin of Electrical Engineering and Informatics, 14 (5). pp. 3704-3715. ISSN 2089-3191 E-ISSN 2302-9285 https://beei.org/index.php/EEI/article/view/10063 10.11591/eei.v14i5.10063
spellingShingle QA75 Electronic computers. Computer science
Md Isa, Siti Noor Basirah
Emran, Nurul Akmar
Harum, Norharyati
Logenthiran, Machap
Nordin, Azlin
Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection
title Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection
title_full Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection
title_fullStr Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection
title_full_unstemmed Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection
title_short Enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection
title_sort enhancing data integrity in internet of things-based healthcare applications: a visualization approach for duplicate detection
topic QA75 Electronic computers. Computer science
url http://irep.iium.edu.my/123798/7/123798_Enhancing%20data%20integrity%20in%20internet%20of%20things.pdf
http://irep.iium.edu.my/123798/8/123798_Enhancing%20data%20integrity%20in%20internet%20of%20things_Scopus.pdf
http://irep.iium.edu.my/123798/
https://beei.org/index.php/EEI/article/view/10063
url_provider http://irep.iium.edu.my/