Multimodal Data Fusion for Whole-Slide Histopathology Image Classification

5Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Whole slide images (WSIs) are critical for cancer diagnosis but pose computational challenges due to their gigapixel resolution. While automated AI tools can accelerate diagnostic workflows, they often rely on precise annotations and require substantial training data. Integrating multimodal data—such as WSIs and corresponding pathology reports—offers a promising solution to improve classification accuracy and reduce diagnostic variability. In this study, we introduce MPath-Net, an end-to-end multimodal framework that combines WSIs and pathology reports for enhanced cancer subtype classification. Using the TCGA dataset (1684 cases: 916 kidney, 768 lung), we applied multiple-instance learning (MIL) for WSI feature extraction and Sentence-BERT for report encoding, followed by joint fine-tuning for tumor classification. MPath-Net achieved 94.65% accuracy, 0.9553 precision, 0.9472 recall, and 0.9473 F1-score, significantly outperforming baseline models (P < 0.05). In addition, attention heatmaps provided interpretable tumor tissue localization, demonstrating the clinical utility of our approach. These findings suggest that MPath-Net can support pathologists by improving diagnostic accuracy, reducing inter-reader variability, and advancing precision medicine through multimodal AI integration.

Cite

CITATION STYLE

APA

Song, Y., Roy, M., Zhong, M., Chen, L., Lin, M., & Zhang, R. (2025). Multimodal Data Fusion for Whole-Slide Histopathology Image Classification. Journal of Healthcare Informatics Research, 9(4), 513–532. https://doi.org/10.1007/s41666-025-00212-w

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free