Clinical Federated Learning for Private ICD-10 Classification of Electronic Health Records from Several Spanish Hospitals

Nuria Lebeña, Alberto Blanco, Arantza Casillas, Maite Oronoz, Alicia Pérez

Resumen


A bottleneck in the Electronic Health Records (EHRs) classification according to the International Classification of Diseases (ICD) task is the challenge involved in getting large amounts of clinical Spanish documents for training efficient language models with private health data. The federated learning (FL) strategy enables the independent training of several models and the subsequent unification of each resulting model parameters to generate a unified model without the need to share sensitive data out of the clinical facilities. We analyse the feasibility of employing the federation strategy in Spanish in the context of an actual data division environment: data coming from two real hospitals from the Basque health system and generated in the same period. We also propose a method to further pre-train the language model (LM) in a federated manner. We apply our federated further pre-training method to the training of BETO and BERTmultilingual. Our findings clearly show that it is feasible to carry out federated learning for Spanish EHR classification using data spread across different hospitals. Moreover, the proposed LM further pre-training method steadily surpasses the results of the model without further pre-training.

Texto completo:

PDF