Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful...

Full description

Saved in:
Bibliographic Details
Main Authors: Mujtaba, G., Shuib, L., Raj, R.G., Rajandram, R., Shaikh, K., Al-Garadi, M.A.
Format: Article
Published: Public Library of Science 2017
Subjects:
Online Access:http://eprints.um.edu.my/19085/
http://dx.doi.org/10.1371/journal.pone.0170242
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1831506469748473856
author Mujtaba, G.
Shuib, L.
Raj, R.G.
Rajandram, R.
Shaikh, K.
Al-Garadi, M.A.
author_facet Mujtaba, G.
Shuib, L.
Raj, R.G.
Rajandram, R.
Shaikh, K.
Al-Garadi, M.A.
author_sort Mujtaba, G.
building UM Library
collection Institutional Repository
content_provider Universiti Malaya
content_source UM Research Repository
continent Asia
country Malaysia
description Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.
format Article
id my.um.eprints-19085
institution Universiti Malaya
publishDate 2017
publisher Public Library of Science
record_format eprints
spelling my.um.eprints-190852018-09-04T04:33:27Z http://eprints.um.edu.my/19085/ Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection Mujtaba, G. Shuib, L. Raj, R.G. Rajandram, R. Shaikh, K. Al-Garadi, M.A. QA75 Electronic computers. Computer science R Medicine Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports. Public Library of Science 2017 Article PeerReviewed Mujtaba, G. and Shuib, L. and Raj, R.G. and Rajandram, R. and Shaikh, K. and Al-Garadi, M.A. (2017) Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection. PLoS ONE, 12 (2). e0170242. ISSN 1932-6203, DOI https://doi.org/10.1371/journal.pone.0170242 <https://doi.org/10.1371/journal.pone.0170242>. http://dx.doi.org/10.1371/journal.pone.0170242 doi:10.1371/journal.pone.0170242
spellingShingle QA75 Electronic computers. Computer science
R Medicine
Mujtaba, G.
Shuib, L.
Raj, R.G.
Rajandram, R.
Shaikh, K.
Al-Garadi, M.A.
Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection
title Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection
title_full Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection
title_fullStr Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection
title_full_unstemmed Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection
title_short Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection
title_sort automatic icd-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection
topic QA75 Electronic computers. Computer science
R Medicine
url http://eprints.um.edu.my/19085/
http://dx.doi.org/10.1371/journal.pone.0170242
url_provider http://eprints.um.edu.my/