Summary of the paper

Title Identification of Rare & Novel Senses Using Translations in a Parallel Corpus
Authors Richard Schwarz, Hinrich Schütze, Fabienne Martin and Achim Stein
Abstract The identification of rare and novel senses is a challenge in lexicography. Inthis paper, we present a new method for finding such senses using a wordaligned multilingual parallel corpus. We use the Europarl corpus and thereinconcentrate on French verbs. We represent each occurrence of a French verb as ahigh dimensional term vector. The dimensions of such a vector are the possibletranslations of the verb according to the underlying word alignment. Thedimensions are weighted by a weighting scheme to adjust to the significance ofany particular translation. After collecting these vectors we apply forms ofthe K-means algorithm on the resulting vector space to produce clusters ofdistinct senses, so that standard uses produce large homogeneous clusters whilerare and novel uses appear in small or heterogeneous clusters. We show in aqualitative and quantitative evaluation that the method can successfully findrare and novel senses.
Language Statistical and machine learning methods
Topics Tools, systems, applications, Lexicon, lexical database, Statistical and machine learning methods
Full paper Identification of Rare & Novel Senses Using Translations in a Parallel Corpus
Bibtex @InProceedings{SCHWARZ10.411,
  author = {Richard Schwarz, Hinrich Schütze, Fabienne Martin and Achim Stein},
  title = {Identification of Rare & Novel Senses Using Translations in a Parallel Corpus},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA