Title |
Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian |
Authors |
Andrea Zaninello and Malvina Nissim |
Abstract |
The theoretical characterisation of multiword expressions (MWEs) is tightlyconnected to their actual occurrences in data and to their representation inlexical resources. We present three lexical resources for Italian MWEs, namelyan electronic lexicon, a series of example corpora and a database of MWEsrepresented around morphosyntactic patterns. These resources are matchedagainst, and created from, a very large web-derived corpus for Italian thatspans across registers and domains. We can thus test expressions coded bylexicographers in a dictionary, thereby discarding unattested expressions,revisiting lexicographers's choices on the basis of frequency information, andat the same time creating an example sub-corpus for each entry. We organiseMWEs on the basis of the morphosyntactic information obtained from the data inan electronic, flexible knowledge-base containing structured annotationexploitable for multiple purposes. We also suggest further work directionstowards characterising MWEs by analysing the data organised in our databasethrough lexico-semantic information available in WordNet or MultiWordNet-likeresources, also in the perspective of expanding their set through theextraction of other similar compact expressions. |
Language |
Validation of LRs |
Topics |
MultiWord Expressions & Collocations, Lexicon, lexical database, Validation of LRs |
Full paper  |
Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian |
Bibtex |
@InProceedings{ZANINELLO10.567,
author = {Andrea Zaninello and Malvina Nissim}, title = {Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |