Semantic Relations Predict the Bracketing of Three-Component Multiword Terms

Juan Rojas-Garcia


For English multiword terms (MWTs) of three or more constituents (e.g., sea level rise), a semantic analysis, based on linguistic and domain knowledge, is necessary to resolve the dependency between components. This structural disambiguation, often known as bracketing, involves the grouping of the dependent components so that the MWT is reduced to its basic form of modifier+head, as in [sea level] [rise]. Knowledge of these dependencies facilitates the comprehension of an MWT and its accurate translation into other languages. Moreover, the resolution of MWT bracketing provides a higher overall accuracy in machine translation systems and sentence parsers. This paper thus presents a pilot study that explored whether the bracketing of a ternary compound, when used as an argument in a sentence, can be predicted from the semantic information encoded in that sentence. It is shown that, with a random forest model, the semantic relation of the MWT to another argument in the same sentence, the lexical domain of the predicate, and the semantic role of the MWT were able to predict the bracketing of the 190 ternary compounds used as arguments in a sample of 188 semantically annotated sentences from a Coastal Engineering corpus (100% F1-score). Furthermore, only the semantic relation of anMWT to another argument in the same sentence proved enormous capability to predict ternary compound bracketingwith a binary decision-tree model (94.12% F1-score).

Texto completo: