Knee Osteoarthritis Diagnosis With Unimodal and Multi-Modal Neural Networks: Data From the Osteoarthritis Initiative

Knee osteoarthritis (OA) is a prevalent musculoskeletal condition affecting millions worldwide, posing significant health and economic burdens. Characterized by the degeneration of joint cartilage, the progression of knee OA varies significantly among individuals, making its prediction a complex iss...

Full description

Saved in:
Bibliographic Details
Main Authors: Teh, Xin Yu, Yeoh, Pauline Shan Qing, Wang, Tao, Wu, Xiang, Hasikin, Khairunnisa, Lai, Khin Wee
Format: Article
Published: Institute of Electrical and Electronics Engineers 2024
Subjects:
Online Access:http://eprints.um.edu.my/47130/
https://doi.org/10.1109/ACCESS.2024.3472654
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Knee osteoarthritis (OA) is a prevalent musculoskeletal condition affecting millions worldwide, posing significant health and economic burdens. Characterized by the degeneration of joint cartilage, the progression of knee OA varies significantly among individuals, making its prediction a complex issue. Previous studies on automated knee OA diagnosis have primarily relied on unimodal data, often overlooking the valuable information present in multi-modal data. Multi-modal learning, which integrates information from various modalities, is increasingly recognized for its potential to enhance diagnostic performance in medical applications. However, such models incur a higher computational load due to the additional data required. This research investigates the feasibility of multi-modal neural networks in knee OA diagnosis by integrating structural demographic data with unstructured imaging data. Three deep learning unimodal models (InceptionV3, DIKO, and EfficientNetv2) were transformed into multi-modal architectures (MF_InceptionNet, MF_DIKO, and MF_Eff) to compare their diagnostic capabilities. The proposed multi-modal models share a common architecture, with unimodal models acting as image feature extraction backbones and separate embedding layers for demographic data. The image features and demographic embeddings are combined into a unified vector before classification. Extensive experiments were conducted to evaluate the performance of these models across different class categories and dataset sizes. MF_DIKO and InceptionV3 emerged as the best multi-modal and unimodal neural networks, respectively, with overall accuracies of 0.67 and 0.75 for 3-class severity classification. Contrary to existing literature, our findings reveal that unimodal neural networks using only imaging features outperform multi-modal networks, suggesting unimodal models might suffice in certain applications.