Unifying Named Entity Recognition and Extreme Multi-Label Classification f or Explainable Clinical Coding

Alicia Ramirez-Arrabe, Andres Duque, Juan Martinez-Romo

Resumen


Automatic clinical coding of medical reports sits at the intersection of healthcare and Natural Language Processing (NLP), facilitating the extraction of relevant information from unstructured clinical documents. This study introduces a three-stage explainable automatic coding system, developed within the experimental framework of the 2020 CodiEsp competition, a task devoted to automatic clinical coding in Spanish. The proposed system integrates two Named Entity Recognition (NER)-based models, a supervised text classification model, and an unsupervised similarity model enhanced with keyphrase extraction. This methodology allows for the detection of overlapped and discontinuous evidence texts, as well as for the inclusion of Out-Of-Distribution (OOD) codes. Our approach outperforms most state-of-the-art models, achieving an F1-score improvement of 4.2%, 0.2%, and 4.1% in the CodiEsp-D, CodiEsp-P and CodiEsp-X subtasks, respectively, and an increase of up to 2.4% in the MAP values.

Texto completo:

PDF