Summary of the paper

Title Idioms in Context: The IDIX Corpus
Authors Caroline Sporleder, Linlin Li, Philip Gorinski and Xaver Koch
Abstract Idioms and other figuratively used expressions pose considerableproblems to natural language processing applications because they arevery frequent and often behave idiosyncratically. Consequently, therehas been much research on the automatic detection and extraction ofidiomatic expressions. Most studies focus on type-based idiomdetection, i.e., distinguishing whether a given expression can(potentially) be used idiomatically. However, many expressions such as"break the ice" can have both literal and non-literal readingsand need to be disambiguated in a given context (token-baseddetection). So far relatively few approaches have attemptedcontext-based idiom detection. One reason for this may be that fewannotated resources are available that disambiguate expressions incontext. With the IDIX corpus, we aim to address this. IDIX isavailable as an add-on to the BNC and disambiguates different usagesof a subset of idioms. We believe that this resource will be usefulboth for linguistic and computational linguistic studies.
Language Word Sense Disambiguation
Topics Corpus (creation, annotation, etc.), MultiWord Expressions & Collocations, Word Sense Disambiguation
Full paper Idioms in Context: The IDIX Corpus
Bibtex @InProceedings{SPORLEDER10.618,
  author = {Caroline Sporleder, Linlin Li, Philip Gorinski and Xaver Koch},
  title = {Idioms in Context: The IDIX Corpus},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA