TB-FusionNet: a multi-scale feature fusion algorithm with spatial and channel cross-attention for tuberculosis detection

The lesions of tuberculosis (TB) in X-ray images are highly complex, exhibiting a variety of sizes, shapes, and structural variations. Single-scale features are insufficient to fully represent this diversity and complexity, thereby limiting the effectiveness of TB detection. As a result, multi-scale...

Full description

Saved in:
Bibliographic Details
Main Authors: Ding, Zeyu, Yaakob, Razali, Tieng Wei, Koh, Azman, Azreen, Mohd Rum, Siti Nurulain, Zakaria, Nor Fadhlina, Ahmad Nazri, Azree Shahril
Format: Article
Language:en
Published: Institute of Electrical and Electronics Engineers 2026
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/123444/1/123444.pdf
http://psasir.upm.edu.my/id/eprint/123444/
https://ieeexplore.ieee.org/document/11328080/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The lesions of tuberculosis (TB) in X-ray images are highly complex, exhibiting a variety of sizes, shapes, and structural variations. Single-scale features are insufficient to fully represent this diversity and complexity, thereby limiting the effectiveness of TB detection. As a result, multi-scale feature fusion has become a widely explored approach in the field of TB detection. However, current multi-scale feature fusion methods still have several limitations. First, the weight allocation in existing methods typically remains at the feature level without extending to local features and channels. This limitation prevents the model from precisely controlling the significance of local features and channels, resulting in coarse feature representations. Second, current methods neglect the contextual information between features at different levels, which further undermines the consistency of the fused features and leads to a lack of semantic coherence. To address the aforementioned issues, this study proposes TB-FusionNet, a multi-scale feature fusion algorithm based on channel and spatial cross-attention mechanisms, for tuberculosis classification. The algorithm first calculates the similarity between local features at different levels to precisely select low-level detail features that are closely related to high-level semantic features, thereby generating a more hierarchical feature representation. Next, the algorithm computes the dependencies between feature channels at different levels, allowing low-level channel features to be appropriately enhanced or suppressed under the guidance of high-level channels, effectively improving the semantic consistency between cross-level features. Through these operations, the model can dynamically adjust the weight distribution of local features and channels according to task requirements, thereby more flexibly adapting to the challenges of complex tasks. Experiments were conducted on the Shenzhen, Montgomery and HSAAS datasets in this study. The results demonstrate that the proposed method outperforms current state-of-the-art approaches, validating its effectiveness and robustness.