Summary of the paper

Title Linguistically Motivated Unsupervised Segmentation for Machine Translation
Authors Mark Fishel and Harri Kirik
Abstract In this paper we use statistical machine translation and morphology informationfrom two different morphological analyzers to try to improve translationquality by linguistically motivated segmentation. The morphological analyzerswe use are the unsupervised Morfessor morpheme segmentation and analyzertoolkit and the rule-based morphological analyzer T3. Our translations are doneusing the Moses statistical machine translation toolkit with training on theJRC-Acquis corpora and translating on Estonian to English and English toEstonian language directions. In our work we model such linguistic phenomena asword lemmas and endings and splitting compound words into simpler parts. Alsolemma information was used to introduce new factors to the corpora and to usethis information for better word alignment or for alternative path back-offtranslation. From the results we find that even though these methods have shownpreviously and keep showing promise of improved translation, their successstill largely depends on the corpora and language pairs used.
Language Knowledge Discovery/Representation
Topics Machine Translation, SpeechToSpeech Translation, Morphology, Knowledge Discovery/Representation
Full paper Linguistically Motivated Unsupervised Segmentation for Machine Translation
Bibtex @InProceedings{FISHEL10.604,
  author = {Mark Fishel and Harri Kirik},
  title = {Linguistically Motivated Unsupervised Segmentation for Machine Translation},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA