Summary of the paper

Title Extraction of German Multiword Expressions from Parsed Corpora Using Context Features
Authors Marion Weller and Ulrich Heid
Abstract We report about tools for the extraction of German multiword expressions (MWEs)from text corpora; we extract word pairs, but also longer MWEs of differentpatterns, e.g. verb-noun structures with an additional prepositional phrase oradjective. Next to standard association-based extraction, we focus on morpho-syntactic,syntactic and lexical-choice features of the MWE candidates. A broad range of such properties (e.g. number and definiteness of nouns,adjacency of the MWE’s components and their position in the sentence,preferred lexical modifiers, etc.) along with relevant example sentences, areextracted from dependency-parsed text and stored in a data base. A sample precision evaluation and an analysis of extraction errors are providedalong with the discussion of our extraction architecture. We furthermoremeasure the contribution of the features to the precision of the extraction: byusing both morpho-syntactic and syntactic features, we achieve a higherprecision in the identification of idiomatic MWEs, than by using onlyproperties of one type.
Language Parsing
Topics MultiWord Expressions & Collocations, Lexicon, lexical database, Parsing
Full paper Extraction of German Multiword Expressions from Parsed Corpora Using Context Features
Bibtex @InProceedings{WELLER10.428,
  author = {Marion Weller and Ulrich Heid},
  title = {Extraction of German Multiword Expressions from Parsed Corpora Using Context Features},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA