Effective communication, exchange of ideas, and commerce across borders and cultures requires technology that is able to process textual content in multiple languages, and map it into semantic representations that can be used in applications such as information extraction and question answering. This project innovates on broad-coverage, full-sentence semantic representations to ensure they are cross-linguistically applicable and reflect predicate-argument relations that are central to meaning.
We will address the main obstacle for cross-linguistic application of existing representations — the annotation bottleneck. The bottleneck stems from the high cost of labeling data for training semantic parsers in new languages, and from the traditional reliance on (language-specific) lexicons for semantic roles.
To overcome this hurdle, we will develop methods that support (a) rapid annotation and (b) cross-linguistic applicability. These methods will integrate two recent projects by the PIs: the Universal Conceptual Cognitive Annotation (UCCA) scheme for cross-linguistic semantic annotation; and the Preposition Supersenses project, which defines an inventory of predicate-argument relations. Leveraging the proposed scheme’s cross-linguistic applicability, we will construct multilingual parsers that exploit semantic annotation in one language to help parse another—improving accuracy and reducing the cost of adapting the scheme to a new language.
Improvements in semantic parsing will contribute to the wide variety of applications that use semantic analysis as a first step, and will open the way for semantics-based methods in other central applications such as machine translation.