LREC 2010 Proceedings

Summary of the paper

Title	Idioms in Context: The IDIX Corpus
Authors	Caroline Sporleder, Linlin Li, Philip Gorinski and Xaver Koch
Abstract	Idioms and other figuratively used expressions pose considerableproblems to natural language processing applications because they arevery frequent and often behave idiosyncratically. Consequently, therehas been much research on the automatic detection and extraction ofidiomatic expressions. Most studies focus on type-based idiomdetection, i.e., distinguishing whether a given expression can(potentially) be used idiomatically. However, many expressions such as"break the ice" can have both literal and non-literal readingsand need to be disambiguated in a given context (token-baseddetection). So far relatively few approaches have attemptedcontext-based idiom detection. One reason for this may be that fewannotated resources are available that disambiguate expressions incontext. With the IDIX corpus, we aim to address this. IDIX isavailable as an add-on to the BNC and disambiguates different usagesof a subset of idioms. We believe that this resource will be usefulboth for linguistic and computational linguistic studies.
Language	Word Sense Disambiguation
Topics	Corpus (creation, annotation, etc.), MultiWord Expressions & Collocations, Word Sense Disambiguation
Full paper	Idioms in Context: The IDIX Corpus
Bibtex	@InProceedings{SPORLEDER10.618, author = {Caroline Sporleder, Linlin Li, Philip Gorinski and Xaver Koch}, title = {Idioms in Context: The IDIX Corpus}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }