Title |
mwetoolkit: a Framework for Multiword Expression Identification |
Authors |
Carlos Ramisch, Aline Villavicencio and Christian Boitet |
Abstract |
This paper presents the Multiword Expression Toolkit (mwetoolkit), anenvironment for type and language-independent MWE identification from corpora.The mwetoolkit provides a targeted list of MWE candidates, extracted andfiltered according to a number of user-defined criteria and a set of standardstatistical association measures. For generating corpus counts, the toolkitprovides both a corpus indexation facility and a tool for integration with websearch engines, while for evaluation, it provides validation and annotationfacilities. The mwetoolkit also allows easy integration with a machine learningtool for the creation and application of supervised MWE extraction models ifannotated data is available. In our experiment, the mwetoolkit was tested andevaluated in the context of MWE extraction in the biomedical domain. Ourpreliminary results show that the toolkit performs better than otherapproaches, especially concerning recall. Moreover, this first version can alsobe extended in several ways in order to improve the quality of the results. |
Language |
Statistical and machine learning methods |
Topics |
MultiWord Expressions & Collocations, Acquisition, Statistical and machine learning methods |
Full paper  |
mwetoolkit: a Framework for Multiword Expression Identification |
Bibtex |
@InProceedings{RAMISCH10.803,
author = {Carlos Ramisch, Aline Villavicencio and Christian Boitet}, title = {mwetoolkit: a Framework for Multiword Expression Identification}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |