Summary of the paper

Title MultiUN: A Multilingual Corpus from United Nation Documents
Authors Andreas Eisele and Yu Chen
Abstract This paper describes the acquisition, preparation and properties of a corpusextracted from the official documents of the United Nations(UN). This corpus is available in all 6 official languages of the UN,consisting of around 300 million words per language. We describe themethods we used for crawling, document formatting, and sentence alignment. Thiscorpus also includes a common test set for machinetranslation. We present the results of a French-Chinese machine translationexperiment performed on this corpus.
Language Multilinguality
Topics Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), Multilinguality
Full paper MultiUN: A Multilingual Corpus from United Nation Documents
Bibtex @InProceedings{EISELE10.686,
  author = {Andreas Eisele and Yu Chen},
  title = {MultiUN: A Multilingual Corpus from United Nation Documents},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA