Extraction and Structuring of Financial Terminology
Resumen
This study focuses on automatic term extraction to detect domain-specific terms from Spanish financial reports using BERT and RoBERTa monolingual and multilingual language models. We have evaluated the performance of the models, paying attention to their ability to identify terms that were not present during training. Additionally, we have conducted a thorough analysis of false positives, false negatives, and true positives. To further enhance our analysis, we have employed social network analysis techniques to systematically organize the extracted terms into relevant clusters. Our findings emphasize that transformer language models are a cost-effective choice for identifying such terms and show how clustering allows us to organize them into coherent and meaningful groups.