ATLAS Multilingual Language Processing Platform

Maciej Ogrodniczuk , Diman Karagiozov


This paper intends to present the ATLAS platform multilingual language processing framework integrating the common set of linguistic tools for a group of European languages (less-resourced: Bulgarian, Croatian, Greek, Polish and Romanian together with English and German as reference languages). State-of-the-art NLP functionality offered by this platform allows for multilingual annotation of texts on lower levels (segmentation, morphosyntax) which in turn supports higher-level processing such as automated categorization, information extraction, machine translation or summarization. More elaborate annotation properties are also made available, such as extracted named entities or lemmatized multiword expressions. Multilevel annotation of texts is governed by language processing chains constructed with UIMA (Unstructured Information Managment Application) industry standard.
To demonstrate capabilities of the framework, three linguistically-aware online services have been built on top of it: i-Publisher (Web-based content management platform), i-Librarian and EUDocLib (two sample online services built with and on top of i-Publisher to illustrate the benefits of applying language technology to content management).

