Summary of the paper

Title Bilingual Lexicon Induction: Effortless Evaluation of Word Alignment Tools and Production of Resources for Improbable Language Pairs
Authors Adrien Lardilleux, Julien Gosme and Yves Lepage
Abstract In this paper, we present a simple protocol to evaluate word aligners onbilingual lexicon induction tasks from parallel corpora. Rather than resortingto gold standards, it relies on a comparison of the outputs of word alignersagainst a reference bilingual lexicon. The quality of this reference bilinguallexicon does not need to be particularly high, because evaluation quality isensured by systematically filtering this reference lexicon with the parallelcorpus the word aligners are trained on. We perform a comparison of threefreely available word aligners on numerous language pairs from the Bibleparallel corpus (Resnik et al., 1999): MGIZA++ (Gao and Vogel, 2008),BerkeleyAligner (Liang et al., 2006), and Anymalign (Lardilleux and Lepage,2009). We then select the most appropriate one to produce bilingual lexiconsfor all language pairs of this corpus. These involve Cebuano, Chinese, Danish,English, Finnish, French, Greek, Indonesian, Latin, Spanish, Swedish, andVietnamese. The 66 resulting lexicons are made freely available.
Language Endangered languages
Topics Lexicon, lexical database, Evaluation methodologies, Endangered languages
Full paper Bilingual Lexicon Induction: Effortless Evaluation of Word Alignment Tools and Production of Resources for Improbable Language Pairs
Bibtex @InProceedings{LARDILLEUX10.293,
  author = {Adrien Lardilleux, Julien Gosme and Yves Lepage},
  title = {Bilingual Lexicon Induction: Effortless Evaluation of Word Alignment Tools and Production of Resources for Improbable Language Pairs},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA