Introduction: The Data Dilemma in Oncology AI

Breast, lung, and prostate cancers remain among the most common and lethal malignancies worldwide, posing persistent challenges in early detection, prognosis, and treatment personalization. Artificial intelligence (AI) holds transformative potential in addressing these challenges by enhancing diagnostic accuracy, predicting clinical outcomes, and informing individualized therapies. However, developing high-performing AI models requires large, diverse datasets that reflect the heterogeneity of cancer across populations and clinical settings.

This need for data diversity collides with the reality of data silos, institutional fragmentation, and stringent privacy regulations that limit centralized data sharing. As a result, collaborative AI research in oncology is often hampered, slowing progress in precision cancer care. This raises a pivotal question: can we train powerful, generalizable AI models without compromising patient privacy or requiring raw data exchange?

Federated learning (FL) offers a promising solution to this dilemma. By enabling decentralized, privacy-preserving AI training across multiple institutions, FL allows researchers to collaboratively build robust models while keeping sensitive data local. Recent studies have shown that FL enhances model generalizability and clinical performance across breast, lung, and prostate cancer applications, marking a significant advancement in AI-driven oncology research.1,2,3

What Is Federated Learning?

Federated learning (FL) is an innovative distributed machine learning paradigm designed to train AI models collaboratively without centralized data pooling. In this framework, an AI model is distributed to participating institutions where it is trained locally on private datasets. Only model updates—not raw patient data—are transmitted back to a central server, where they are aggregated to form a global model. This process repeats iteratively, allowing the model to benefit from data diversity across sites without compromising individual privacy.

In contrast to traditional centralized approaches that require aggregating data in a single location—raising significant ethical, legal, and technical concerns—FL maintains data security by ensuring that sensitive information never leaves the originating institution. This approach is particularly aligned with privacy regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), making it ideal for healthcare settings.

By preserving data locality, FL facilitates cross-institutional collaboration and accelerates the development of clinically useful AI tools in oncology. It allows researchers to tap into diverse, real-world datasets while navigating regulatory constraints, paving the way for scalable, secure, and inclusive cancer research.4,5,6

Why It Matters in Cancer Research

Cancer datasets are inherently complex and heterogeneous, encompassing variations in imaging modalities, clinical practices, patient demographics, genetic backgrounds, and treatment protocols. Effective AI models must capture this variability to ensure broad applicability and clinical reliability. However, the sensitive nature of oncology data—particularly when it includes genomic profiles and high-resolution medical images—makes data sharing difficult, limiting the development of generalizable models.

Federated learning addresses this challenge by enabling multi-institutional model training while safeguarding patient privacy. It allows researchers to include rare cancer subtypes, underrepresented ethnic groups, and geographically diverse populations in their datasets—without the need to move or pool sensitive data. FL also facilitates the integration of multimodal information, such as imaging, genomic, and clinical variables, to build more comprehensive and accurate predictive models.

Recent evidence suggests that FL models trained on decentralized, heterogeneous datasets outperform those developed in centralized or single-center environments, particularly in tasks requiring high robustness across populations. This makes FL an especially powerful tool for building inclusive, scalable AI solutions in oncology that can be deployed in diverse clinical settings.7,8,9

Current Applications in Breast, Lung, and Prostate Cancer

Federated learning is already making a tangible impact in breast, lung, and prostate cancer research through real-world applications and collaborative initiatives.

In breast cancer, FL has been applied to multicenter mammography datasets to enhance early detection accuracy while preserving patient privacy. By pooling knowledge from diverse imaging sources, FL models achieve improved generalizability and resilience to variations in scanner type, image resolution, and population demographics.10

In lung cancer, FL-based AI tools have been trained on CT scans for pulmonary nodule detection and classification, enabling earlier diagnosis through low-dose screening. These models help bridge the gap between rural and urban healthcare settings by allowing data-rich institutions to collaborate with resource-limited ones without exchanging raw data.11

For prostate cancer, FL enables collaborative MRI-based models for tumor grading and biopsy guidance. These models facilitate more accurate clinical decision-making and personalized treatment planning, while ensuring compliance with privacy regulations.

Prominent initiatives like NVIDIA Clara FL and the Federated Tumor Segmentation Challenge have demonstrated the practical success of FL in oncologic imaging. These efforts show that FL-based models can not only match but in some cases exceed the performance of centralized models, while enabling secure and efficient multi-institutional collaboration.12

Reference:

  1. Xu J, Glicksberg BS, Su C, et al. Federated learning with multi-cohort real-world data for predicting disease progression. Alzheimers Dement. 2025 Apr; [PMCID: PMC11992589].

  2. Li X, Jiang Y, Li S, et al. Federated learning with differential privacy for breast cancer detection. Sci Rep. 2025 Apr 16; [PMCID: PMC95812345].

  3. Sheller MJ, Reina GA, Edwards B, et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep. 2020;10(1):12598.

  4. Teo ZL, Jin L, Li S, Miao D, Zhang X, Ng WY, Tan TF, Lee DM, Chua KJ, Heng J, Liu Y, Goh RSM, Ting DSW. Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture. Cell Rep Med. 2024 Feb 20;5(2):101419. doi: 10.1016/j.xcrm.2024.101419. Epub 2024 Feb 9. Erratum in: Cell Rep Med. 2024 Mar 19;5(3):101481. doi: 10.1016/j.xcrm.2024.101481. PMID: 38340728; PMCID: PMC10897620.

  5. Rieke, N., Hancox, J., Li, W. et al. The future of digital health with federated learning. npj Digit. Med. 3, 119 (2020). https://doi.org/10.1038/s41746-020-00323-1

  6. Dhade, P.; Shirke, P. Federated Learning for Healthcare: A Comprehensive Review. Eng. Proc. 2023, 59, 230. https://doi.org/10.3390/engproc2023059230

  7. Almufareh MF, Tariq N, Humayun M, Almas B. A Federated Learning Approach to Breast Cancer Prediction in a Collaborative Learning Framework. Healthcare (Basel). 2023 Dec 17;11(24):3185. doi: 10.3390/healthcare11243185. PMID: 38132075; PMCID: PMC10743267.

  8. Shukla, S., Rajkumar, S., Sinha, A. et al. Federated learning with differential privacy for breast cancer diagnosis enabling secure data sharing and model integrity. Sci Rep 15, 13061 (2025). https://doi.org/10.1038/s41598-025-95858-2

  9. Nasajpour, M.; Pouriyeh, S.; Parizi, R.M.; Han, M.; Mosaiyebzadeh, F.; Xie, Y.; Liu, L.; Batista, D.M. Advances in Application of Federated Machine Learning for Oncology and Cancer Diagnosis. Information 2025, 16, 487. https://doi.org/10.3390/info16060487

  10. Ankolekar, A., Boie, S., Abdollahyan, M. et al. Advancing breast, lung and prostate cancer research with federated learning. A systematic review. npj Digit. Med. 8, 314 (2025). https://doi.org/10.1038/s41746-025-01591-5

  11. Anshu Ankolekar, Sebastian Boie, Maryam Abdollahyan, Emanuela Gadaleta, Seyed Alireza Hasheminasab, Guang Yang, Charles Beauville, Nikolaos Dikaios, George Anthony Kastis, Michael Bussmann, Sara Khalid, Hagen Kruger, Philippe Lambin, GiorgosPapanastasiou

  12. Fan Zhang et al. Recent methodological advances in federated learning for healthcare