Savana: A Global Information Extraction and Terminology Expansion Framework in the Medical Domain

Luis Espinosa, Jorge Tello, Alberto Pardo, Ignacio Medrano, Alberto Ureña, Ignacio Salcedo, Horacio Saggion

Resumen


Terminological databases constitute a fundamental source of information in the medical domain. They are used daily both by practitioners in the area, as well as in academia. Several resources of this kind are available, e.g. CIE, SnomedCT or UMLS (Unified Medical Language System). These terminological databases are of high quality due to them being the result of collaborative expert knowledge. However, they may show certain drawbacks in terms of faithfully representing the ever-changing medical domain. Therefore, systems aimed at capturing novel terminological knowledge in heterogeneous text sources, and able to include them in standard terminologies have the potential to add great value to such repositories. This paper presents, first, Savana, a Biomedical Information Extraction system which, combined with a validation phase carried out by medical practitioners, is used to populate the Spanish branch of SnomedCT with novel knowledge. Second, we describe and evaluate a system which, given a novel medical term, finds its most likely hypernym, thus becoming an enabler in the task of terminological database enrichment and expansion.

Texto completo:

PDF

Referencias


Miller, George A. (1995). WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39–41.

Miller, George A. 1995. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39–41.