Abstract
PURPOSE Building well-performing machine learning (ML) models in health care has always been exigent because of the data-sharing concerns, yet ML approaches often require larger training samples than is afforded by one institution. This paper explores several federated learning implementations by applying them in both a simulated environment and an actual implementation using electronic health record data from two academic medical centers on a Microsoft Azure Cloud Databricks platform. MATERIALS AND METHODS Using two separate cloud tenants, ML models were created, trained, and exchanged from one institution to another via a GitHub repository. Federated learning processes were applied to both artificial neural networks (ANNs) and logistic regression (LR) models on the horizontal data sets that are varying in count and availability. Incremental and cyclic federated learning models have been tested in simulation and real environments. RESULTS The cyclically trained ANN showed a 3% increase in performance, a significant improvement across most attempts (P , .05). Single weight neural network models showed improvement in some cases. However, LR models did not show much improvement after federated learning processes. The specific process that improved the performance differed based on the ML model and how federated learning was implemented. Moreover, we have confirmed that the order of the institutions during the training did influence the overall performance increase. CONCLUSION Unlike previous studies, our work has shown the implementation and effectiveness of federated learning processes beyond simulation. Additionally, we have identified different federated learning models that have achieved statistically significant performances. More work is needed to achieve effective federated learning processes in biomedicine, while preserving the security and privacy of the data. Recent advancements in artificial intelligence (AI) have demonstrated the potential to transform medicine 1 and are promising for improving outcomes while reducing the cost of patient care because of its capability for earlier, more accurate diagnosis and personalized patient-centered care. Image classification, speech recognition, and natural language processing have seen some noteworthy achievements. 2 Moreover, thanks to machine learning (ML), hospitals can accomplish more efficient clinical workflows by reducing unnecessary procedures, which leads to further cost reductions. 1 The performance of an ML algorithm depends highly on the amount and quality of data it is trained on, particularly for more complex models. 3 In the era of precision medicine, the availability of complex multidimensional patient data sets requires larger population samples for generalization. 4 Furthermore, the scarcity of data in underrepresented populations may lead to biases when training data do not sufficiently reflect the attributes of these populations. 5 Healthcare data quality and algorithmic challenges are also known barriers for ML. 6 Many approaches have been proposed to address the lack of data heterogeneity. 7,8 The most promising of these approaches requires multi-institutional collaborations that would increase not only the size of the training data but also its data diversity. Ideally, study data from each institution would be shared via a central data store where a single model can be trained on the combined multi-institutional data. However, there are several obstacles to implementing such a solution. 7-9 First, central storage and transferring large amounts of data over the network have an exorbitant associated cost. 10 The second major obstacle is the regulatory barrier surrounding patient data protection.
Cite
CITATION STYLE
Rajendran, S., Obeid, J. S., Binol, H., D`Agostino, R., Foley, K., Zhang, W., … Topaloglu, U. (2021). Cloud-Based Federated Learning Implementation Across Medical Centers. JCO Clinical Cancer Informatics, (5), 1–11. https://doi.org/10.1200/cci.20.00060
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.