Named Entity Recognition: a Survey for the Portuguese Language

Hidelberg O. Albuquerque, Ellen Souza, Carlos Gomes, Matheus Henrique de C. Pinto, Ricardo P. S. Filho, Rosimeire Costa, Vinícius Teixeira de M. Lopes, Nádia F. F. da Silva, André C. P. L. F. de Carvalho, Adriano L. I. Oliveira


Named Entity Recognition (NER) is an important task in Natural Language Processing, as it is a key information extraction sub-task with numerous applications, such as information retrieval and machine learning. However, resources are still scarce for some languages, as it is the case of Portuguese. Thus, the objective of this research is to map NER techniques, methods and resources for the Portuguese language. Manual and automated searches were applied, retrieving 447 primary studies, of which 45 were included in our review. The growing number of studies reveal a greater interest of researchers in the area. 21 studies focused on the comparative analysis between techniques and tools. 24 new or updated NER corpora were mapped, in several domains. The most used text pre-processing techniques were tokenization, embeddings, and PoS Tagging, while the most used methods/algorithms were based on BiLSTM, CRF, and BERT models. The most relevant researchers, institutions and countries were also mapped, as well as the evolution of publications.

Texto completo: