Summary of the paper

Title Active Learning and Crowd-Sourcing for Machine Translation
Authors Vamshi Ambati, Stephan Vogel and Jaime Carbonell
Abstract Large scale parallel data generation for new language pairs requires intensivehuman effort and availability of experts. It becomes immensely difficult andcostly to provide Statistical Machine Translation (SMT) systems for mostlanguages due to the paucity of expert translators to provide parallel data.Even if experts are present, it appears infeasible due to the impending costs.In this paper we propose Active Crowd Translation (ACT), a new paradigm whereactive learning and crowd-sourcing come together to enable automatictranslation for low-resource language pairs. Active learning aims at reducingcost of label acquisition by prioritizing the most informative data forannotation, while crowd-sourcing reduces cost by using the power of the crowdsto make do for the lack of expensive language experts. We experiment andcompare our active learning strategies with strong baselines and seesignificant improvements in translation quality. Similarly, our experimentswith crowd-sourcing on Mechanical Turk have shown that it is possible to createparallel corpora using non-experts and with sufficient quality assurance, atranslation system that is trained using this corpus approaches expert quality.
Language Corpus (creation, annotation, etc.)
Topics Machine Translation, SpeechToSpeech Translation, Statistical and machine learning methods, Corpus (creation, annotation, etc.)
Full paper Active Learning and Crowd-Sourcing for Machine Translation
Bibtex @InProceedings{AMBATI10.244,
  author = {Vamshi Ambati, Stephan Vogel and Jaime Carbonell},
  title = {Active Learning and Crowd-Sourcing for Machine Translation},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA