Translating medical image to radiological report: Adaptive multilevel multi-attention approach
Background and Objective: Medical imaging techniques are widely employed in disease diagnosis and treatment. A readily available medical report can be a useful tool in assisting an expert for investigating the patient's health. A radiologist can benefit from an automatic medical image to radiol...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Published: |
Elsevier Ireland Ltd
2022
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85129770805&doi=10.1016%2fj.cmpb.2022.106853&partnerID=40&md5=9da9b11e6d480c3e0979bc75b4590804 http://eprints.utp.edu.my/33049/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utp.eprints.33049 |
---|---|
record_format |
eprints |
spelling |
my.utp.eprints.330492022-06-09T08:11:21Z Translating medical image to radiological report: Adaptive multilevel multi-attention approach Gajbhiye, G.O. Nandedkar, A.V. Faye, I. Background and Objective: Medical imaging techniques are widely employed in disease diagnosis and treatment. A readily available medical report can be a useful tool in assisting an expert for investigating the patient's health. A radiologist can benefit from an automatic medical image to radiological report translation system while preparing a final report. Previous attempts on automatic medical report generation task includes image captioning algorithms without taking domain-specific visual and textual contents into account, thus arises the question about credibility of generated report. Methods: In this work, a novel Adaptive Multilevel Multi-Attention (AMLMA) approach is proposed by offering domain-specific visual-textual knowledge to generate a thorough and believable radiological report for any view of a human chest X-ray image. The proposed approach leverages the encoder-decoder framework incorporated with multiple adaptive attention mechanisms. The potential of a convolutional neural network (CNN) with residual attention module (RAM) is demonstrated as a strong visual encoder for multi-label abnormality detection. The multilevel visual features (local and global) are extracted from proposed visual encoder to retrieve regional-level and abstract-level radiology-based semantic information. The Word2Vec and FastText word embeddings are trained on medical reports to acquire radiological knowledge and further used as textual encoders, feeding as input to Bi-directional Long Short Term Memory (Bi-LSTM) network to learn the co-relationship between medical terminologies in radiological reports. The AMLMA employs a weighted multilevel association of adaptive visual-semantic attention and visual-based linguistic attention mechanisms. This association of adaptive attention is exploited as a decoder and produces significant improvements in the report generation task. Results: The proposed approach is evaluated on a publicly available Indiana University chest X-ray (IU-CXR) dataset. The CNN with RAM shows the significant improvement in recall (0.4423), precision (0.1803) and F1-score (0.2551) for prediction of multiple abnormalities in X-ray image. The results of language generation metrics for proposed variants were acquired using the COCO-caption evaluation Application Program Interface (API). The trained embeddings with AMLMA model generates the convincing radiology report and outperform state-of-the-art (SOTA) approaches with high evaluation metrics scores for Bleu-4 (0.172), Meteor (0.247), RougeL (0.376) and CIDEr (0.381). In addition, a new �Unique Index� (UI) statistic is introduced to highlight the model's ability for generating unique reports. Conclusion: The overall architecture aids to the understanding of various X-ray image views and generating the relevant normal and abnormal radiography statements. The proposed model is emphasized on multi-level visual-textual knowledge with adaptive attention mechanism to balance visual and linguistic information for the generation of admissible radiology report. © 2022 Elsevier B.V. Elsevier Ireland Ltd 2022 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85129770805&doi=10.1016%2fj.cmpb.2022.106853&partnerID=40&md5=9da9b11e6d480c3e0979bc75b4590804 Gajbhiye, G.O. and Nandedkar, A.V. and Faye, I. (2022) Translating medical image to radiological report: Adaptive multilevel multi-attention approach. Computer Methods and Programs in Biomedicine, 221 . http://eprints.utp.edu.my/33049/ |
institution |
Universiti Teknologi Petronas |
building |
UTP Resource Centre |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Petronas |
content_source |
UTP Institutional Repository |
url_provider |
http://eprints.utp.edu.my/ |
description |
Background and Objective: Medical imaging techniques are widely employed in disease diagnosis and treatment. A readily available medical report can be a useful tool in assisting an expert for investigating the patient's health. A radiologist can benefit from an automatic medical image to radiological report translation system while preparing a final report. Previous attempts on automatic medical report generation task includes image captioning algorithms without taking domain-specific visual and textual contents into account, thus arises the question about credibility of generated report. Methods: In this work, a novel Adaptive Multilevel Multi-Attention (AMLMA) approach is proposed by offering domain-specific visual-textual knowledge to generate a thorough and believable radiological report for any view of a human chest X-ray image. The proposed approach leverages the encoder-decoder framework incorporated with multiple adaptive attention mechanisms. The potential of a convolutional neural network (CNN) with residual attention module (RAM) is demonstrated as a strong visual encoder for multi-label abnormality detection. The multilevel visual features (local and global) are extracted from proposed visual encoder to retrieve regional-level and abstract-level radiology-based semantic information. The Word2Vec and FastText word embeddings are trained on medical reports to acquire radiological knowledge and further used as textual encoders, feeding as input to Bi-directional Long Short Term Memory (Bi-LSTM) network to learn the co-relationship between medical terminologies in radiological reports. The AMLMA employs a weighted multilevel association of adaptive visual-semantic attention and visual-based linguistic attention mechanisms. This association of adaptive attention is exploited as a decoder and produces significant improvements in the report generation task. Results: The proposed approach is evaluated on a publicly available Indiana University chest X-ray (IU-CXR) dataset. The CNN with RAM shows the significant improvement in recall (0.4423), precision (0.1803) and F1-score (0.2551) for prediction of multiple abnormalities in X-ray image. The results of language generation metrics for proposed variants were acquired using the COCO-caption evaluation Application Program Interface (API). The trained embeddings with AMLMA model generates the convincing radiology report and outperform state-of-the-art (SOTA) approaches with high evaluation metrics scores for Bleu-4 (0.172), Meteor (0.247), RougeL (0.376) and CIDEr (0.381). In addition, a new �Unique Index� (UI) statistic is introduced to highlight the model's ability for generating unique reports. Conclusion: The overall architecture aids to the understanding of various X-ray image views and generating the relevant normal and abnormal radiography statements. The proposed model is emphasized on multi-level visual-textual knowledge with adaptive attention mechanism to balance visual and linguistic information for the generation of admissible radiology report. © 2022 Elsevier B.V. |
format |
Article |
author |
Gajbhiye, G.O. Nandedkar, A.V. Faye, I. |
spellingShingle |
Gajbhiye, G.O. Nandedkar, A.V. Faye, I. Translating medical image to radiological report: Adaptive multilevel multi-attention approach |
author_facet |
Gajbhiye, G.O. Nandedkar, A.V. Faye, I. |
author_sort |
Gajbhiye, G.O. |
title |
Translating medical image to radiological report: Adaptive multilevel multi-attention approach |
title_short |
Translating medical image to radiological report: Adaptive multilevel multi-attention approach |
title_full |
Translating medical image to radiological report: Adaptive multilevel multi-attention approach |
title_fullStr |
Translating medical image to radiological report: Adaptive multilevel multi-attention approach |
title_full_unstemmed |
Translating medical image to radiological report: Adaptive multilevel multi-attention approach |
title_sort |
translating medical image to radiological report: adaptive multilevel multi-attention approach |
publisher |
Elsevier Ireland Ltd |
publishDate |
2022 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85129770805&doi=10.1016%2fj.cmpb.2022.106853&partnerID=40&md5=9da9b11e6d480c3e0979bc75b4590804 http://eprints.utp.edu.my/33049/ |
_version_ |
1738657448223309824 |
score |
13.211869 |