Open Data for Public Administration: Exploitation and semantic organization of institutional web content

Paula Peña, Rocío Aznar, Rosa Montañés, Rafael del Hoyo


The project presented has been financed by Government of Aragon and is part of the `Open Data' initiative promoted by that organization. Given the amount of unstructured information related to the Government of Aragon currently published on the Internet, with slightly or no standardization and decentralized, it emerges the need to gather it systematically to be offered to all interested collectives from a single access point in a public and structured way. Within this context, `Aragon Open Data' project aims to collect, organize, store and maintain updated, Administration's web information by means of human language and semantic technologies. Firstly, crawling is performed over websites in order to retrieve textual data over which Natural Language Processing (NLP) and ontology-based techniques are applied. Thereafter, results are stored into NoSQL databases, allowing future open access and simple data exploitation. NLP techniques used in the project involve named-entities recognition and classification (NERC) and texts semantic classification and summarization.

