Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes

Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle


This study investigates the application of a state-of-the-art zero-shot and few-shot natural language processing (NLP) technique for text classification tasks in Catalan, a moderately under-resourced language. The approach involves reformulating the downstream task as textual entailment, which is then solved by an entailment model. However, unlike English, where entailment models can be trained on huge Natural Language Inference (NLI) datasets, the lack of such large resources in Catalan poses a challenge. In this context, we comparatively explore training on monolingual and (larger) multilingual resources, and identify the strengths and weaknesses of monolingual and multilingual individual components of entailment models: pre-trained language model and NLI training dataset. Furthermore, we propose and implement a simple task transfer strategy using open Wikipedia resources that demonstrates significant performance improvements, providing a practical and effective alternative for languages with limited or no NLI datasets

Texto completo: