Approximating the disambiguation of some German nominalizations by use of weak structural, lexical and corpus information

Kurt Eberle , Ulrich Heid , Gertrud Faaß


Between classical symbolic word sense disambiguation (wsd) using explicit deep semantic representations of sentences and texts and statistical wsd using word co-occurrence information, there is a recent tendency towards mediating methods. Similar to so-called light-weight semantics (Marek, 2009) we suggest to only make sparse use of semantic information. We describe an approximation model based upon flat underspecified discourse representation structures (FUDRSs, cf. Eberle, 2004) that weighs knowledge about context structure, lexical semantic restrictions and interpretation preferences. We give a catalogue of guidelines for human annotation of texts by corresponding indicators. Using this, the reliability of an analysis tool that implements the model can be tested with respect to annotation precision and disambiguation prediction and how both can be improved by bootstrapping the knowledge of the system using corpus information. For the balanced test corpus considered the recognition rate of the preferred reading is 80-90% (depending on the smoothing of parse errors).

Texto completo: