Parsing Schemata for Practical Text Analysis
Resumen
Resumen de Tesis doctoral en Informática realizada por Carlos Gómez-Rodríguez bajo la dirección de los doctores Miguel A. Alonso Pardo (Univ.
da Coruña) y Manuel Vilares Ferro (Univ. de Vigo). La defensa de la tesis tuvo lugar el 5 de junio de 2009 ante el tribunal formado por los doctores John A. Carroll
(Univ. of Sussex), Giorgio Satta (Univ. degli Studi di Padova), Víctor Jesús Díaz Madrigal (Univ. de Sevilla), Leo Wanner (Univ. Pompeu Fabra) y
Jesús Vilares Ferro (Univ. da Coruña). La calificación obtenida fue de \emph{Sobresaliente Cum Laude} por unanimidad, con mención de Doctor Europeo.
This dissertation provides several theoretical and practical tools that extend the applicability of Sikkel's theory of parsing schemata in several different directions.
First, a compilation technique is defined that can be used to obtain efficient implementations of parsers automatically from their corresponding schemata. This makes it possible to use parsing schemata to prototype and test parsing algorithms, without the need of manually converting the formal representation to an efficient implementation in a programming language.
Second, the range of parsing algorithms that can be defined by means of schemata is extended with the definition of new variants of the formalism that can deal with error-repair parsers and dependency-based parsers.
Apart from these tools themselves, the dissertation also introduces several research results that have been obtained by using them. The compilation technique is used to obtain implementations of different parsers for context-free grammars (CFG) and tree-adjoining grammars (TAG) and perform an empirical analysis of their behaviour with real-sized grammars. The extension of parsing schemata for error-repair parsing is used to define a transformation that can automatically add error-repair capabilities to parsers that do not have them. Finally, the extension of parsing schemata for dependency parsing is used to find formal relationships between several well-known dependency parsers, as well as to define novel algorithms for mildly non-projective dependency structures.
da Coruña) y Manuel Vilares Ferro (Univ. de Vigo). La defensa de la tesis tuvo lugar el 5 de junio de 2009 ante el tribunal formado por los doctores John A. Carroll
(Univ. of Sussex), Giorgio Satta (Univ. degli Studi di Padova), Víctor Jesús Díaz Madrigal (Univ. de Sevilla), Leo Wanner (Univ. Pompeu Fabra) y
Jesús Vilares Ferro (Univ. da Coruña). La calificación obtenida fue de \emph{Sobresaliente Cum Laude} por unanimidad, con mención de Doctor Europeo.
This dissertation provides several theoretical and practical tools that extend the applicability of Sikkel's theory of parsing schemata in several different directions.
First, a compilation technique is defined that can be used to obtain efficient implementations of parsers automatically from their corresponding schemata. This makes it possible to use parsing schemata to prototype and test parsing algorithms, without the need of manually converting the formal representation to an efficient implementation in a programming language.
Second, the range of parsing algorithms that can be defined by means of schemata is extended with the definition of new variants of the formalism that can deal with error-repair parsers and dependency-based parsers.
Apart from these tools themselves, the dissertation also introduces several research results that have been obtained by using them. The compilation technique is used to obtain implementations of different parsers for context-free grammars (CFG) and tree-adjoining grammars (TAG) and perform an empirical analysis of their behaviour with real-sized grammars. The extension of parsing schemata for error-repair parsing is used to define a transformation that can automatically add error-repair capabilities to parsers that do not have them. Finally, the extension of parsing schemata for dependency parsing is used to find formal relationships between several well-known dependency parsers, as well as to define novel algorithms for mildly non-projective dependency structures.