Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification
Recent research in digital forensic attempts to classify image clusters into JPEG or non-JPEG clusters before recovering JPEG image files. This issue might improve the recovering JPEG image accuracy and reduce the processing time. In this work, three content-based feature extraction methods are used...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Conference Paper |
Published: |
Springer Science and Business Media Deutschland GmbH
2023
|
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.uniten.dspace-27232 |
---|---|
record_format |
dspace |
spelling |
my.uniten.dspace-272322023-05-29T17:41:19Z Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification Ali R.R. Al-Dayyeni W.S. Gunasekaran S.S. Mostafa S.A. Abdulkader A.H. Rachmawanto E.H. 57200536163 57225961808 55652730500 37036085800 57545111700 57193850466 Recent research in digital forensic attempts to classify image clusters into JPEG or non-JPEG clusters before recovering JPEG image files. This issue might improve the recovering JPEG image accuracy and reduce the processing time. In this work, three content-based feature extraction methods are used. The Rate of Change (RoC) is used for tracking relevant bytes in the appropriate groups of their orders. Entropy and Byte Frequency Distribution (BFD) are used to produce an image cluster histogram based on the size of the byte value. Subsequently, we deploy the Extreme Learning Machine (ELM) classifier to evaluate these three features. The ELM identifies the type based on the generated feature vector, whether a JPEG file or a non-JPEG file type. The proposed method is implemented in MATLAB 2017a software and tested and evaluated by using the DFRWS dataset. The test results show that the ELM produces high classification accuracy in identifying the file type. The difference in accuracy between the combinations of the tested features is relatively small. The worst accuracy is generated when the entropy method is used, which is 72.62%, and the best accuracy of 93.46% is generated when using a combination of the three features. � 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. Final 2023-05-29T09:41:19Z 2023-05-29T09:41:19Z 2022 Conference Paper 10.1007/978-3-030-98015-3_21 2-s2.0-85126979417 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85126979417&doi=10.1007%2f978-3-030-98015-3_21&partnerID=40&md5=c1f85c0a4dcac107ced9c9d91984ebb4 https://irepository.uniten.edu.my/handle/123456789/27232 439 LNNS 314 325 Springer Science and Business Media Deutschland GmbH Scopus |
institution |
Universiti Tenaga Nasional |
building |
UNITEN Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Tenaga Nasional |
content_source |
UNITEN Institutional Repository |
url_provider |
http://dspace.uniten.edu.my/ |
description |
Recent research in digital forensic attempts to classify image clusters into JPEG or non-JPEG clusters before recovering JPEG image files. This issue might improve the recovering JPEG image accuracy and reduce the processing time. In this work, three content-based feature extraction methods are used. The Rate of Change (RoC) is used for tracking relevant bytes in the appropriate groups of their orders. Entropy and Byte Frequency Distribution (BFD) are used to produce an image cluster histogram based on the size of the byte value. Subsequently, we deploy the Extreme Learning Machine (ELM) classifier to evaluate these three features. The ELM identifies the type based on the generated feature vector, whether a JPEG file or a non-JPEG file type. The proposed method is implemented in MATLAB 2017a software and tested and evaluated by using the DFRWS dataset. The test results show that the ELM produces high classification accuracy in identifying the file type. The difference in accuracy between the combinations of the tested features is relatively small. The worst accuracy is generated when the entropy method is used, which is 72.62%, and the best accuracy of 93.46% is generated when using a combination of the three features. � 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. |
author2 |
57200536163 |
author_facet |
57200536163 Ali R.R. Al-Dayyeni W.S. Gunasekaran S.S. Mostafa S.A. Abdulkader A.H. Rachmawanto E.H. |
format |
Conference Paper |
author |
Ali R.R. Al-Dayyeni W.S. Gunasekaran S.S. Mostafa S.A. Abdulkader A.H. Rachmawanto E.H. |
spellingShingle |
Ali R.R. Al-Dayyeni W.S. Gunasekaran S.S. Mostafa S.A. Abdulkader A.H. Rachmawanto E.H. Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification |
author_sort |
Ali R.R. |
title |
Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification |
title_short |
Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification |
title_full |
Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification |
title_fullStr |
Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification |
title_full_unstemmed |
Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification |
title_sort |
content-based feature extraction and extreme learning machine for optimizing file cluster types identification |
publisher |
Springer Science and Business Media Deutschland GmbH |
publishDate |
2023 |
_version_ |
1806424276544258048 |
score |
13.211869 |