A Hybrid Approach to Treebank Construction
Resumen
This paper describes research on the effects of PoS tagging as a preprocess for
HPSG-based deep parsing in the context of an open-source Spanish treebank development
in the DELPH-IN framework. The treebank annotation is performed by hand selecting
the proper decisions among the choices proposed by the system and ranked by a statistical
module. The presented experiments show that the use of a tagger lowers the ambiguity of
the sentences, both reducing the amount of sentences that reach time-out before the entire
parse forest is built, and helping the ranker to place the right tree among the n-best trees.
On the one hand, our results validate the benefits already reported in the literature of
such preprocess to deep parsing with regard to speed, coverage, and accuracy. On the
other hand, we propose a strategy based on existing open-source tools and resources to
develop highly-consistent deep-annotated treebanks for languages with limited availability
of linguistic resources.
HPSG-based deep parsing in the context of an open-source Spanish treebank development
in the DELPH-IN framework. The treebank annotation is performed by hand selecting
the proper decisions among the choices proposed by the system and ranked by a statistical
module. The presented experiments show that the use of a tagger lowers the ambiguity of
the sentences, both reducing the amount of sentences that reach time-out before the entire
parse forest is built, and helping the ranker to place the right tree among the n-best trees.
On the one hand, our results validate the benefits already reported in the literature of
such preprocess to deep parsing with regard to speed, coverage, and accuracy. On the
other hand, we propose a strategy based on existing open-source tools and resources to
develop highly-consistent deep-annotated treebanks for languages with limited availability
of linguistic resources.