Summary of the paper

Title Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Authors Jesús González-Rubio, Jorge Civera, Alfons Juan and Francisco Casacuberta
Abstract Currently, a great effort is being carried out in the digitalisationof large historical document collections for preservation purposes.The documents in these collections are usually written in ancientlanguages, such as Latin or Greek, which limits the access of thegeneral public to their content due to the languagebarrier. Therefore, digital libraries aim not only at storing rawimages of digitalised documents, but also to annotate them with theircorresponding text transcriptions and translations into modernlanguages. Unfortunately, ancient languages have at their disposalscarce electronic resources to be exploited by natural languageprocessing techniques. This paper describes the compilation process ofa novel Latin-Catalan parallel corpus as a new task for statisticalmachine translation (SMT). Preliminary experimental results are alsoreported using a state-of-the-art phrase-based SMT system. Theresults presented in this work reveal the complexity of the task andits challenging, but interesting nature for future development.
Language Statistical and machine learning methods
Topics Corpus (creation, annotation, etc.), Machine Translation, SpeechToSpeech Translation, Statistical and machine learning methods
Full paper Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Bibtex @InProceedings{GONZLEZRUBIO10.541,
  author = {Jesús González-Rubio, Jorge Civera, Alfons Juan and Francisco Casacuberta},
  title = {Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA