Summary of the paper

Title mwetoolkit: a Framework for Multiword Expression Identification
Authors Carlos Ramisch, Aline Villavicencio and Christian Boitet
Abstract This paper presents the Multiword Expression Toolkit (mwetoolkit), anenvironment for type and language-independent MWE identification from corpora.The mwetoolkit provides a targeted list of MWE candidates, extracted andfiltered according to a number of user-defined criteria and a set of standardstatistical association measures. For generating corpus counts, the toolkitprovides both a corpus indexation facility and a tool for integration with websearch engines, while for evaluation, it provides validation and annotationfacilities. The mwetoolkit also allows easy integration with a machine learningtool for the creation and application of supervised MWE extraction models ifannotated data is available. In our experiment, the mwetoolkit was tested andevaluated in the context of MWE extraction in the biomedical domain. Ourpreliminary results show that the toolkit performs better than otherapproaches, especially concerning recall. Moreover, this first version can alsobe extended in several ways in order to improve the quality of the results.
Language Statistical and machine learning methods
Topics MultiWord Expressions & Collocations, Acquisition, Statistical and machine learning methods
Full paper mwetoolkit: a Framework for Multiword Expression Identification
Bibtex @InProceedings{RAMISCH10.803,
  author = {Carlos Ramisch, Aline Villavicencio and Christian Boitet},
  title = {mwetoolkit: a Framework for Multiword Expression Identification},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA