Summary of the paper

Title Heuristic Word Alignment with Parallel Phrases
Authors Maria Holmqvist
Abstract We present a heuristic method for word alignment, which is the task ofidentifying corresponding words in parallel text. The heuristic method is basedon parallel phrases extracted from manually word aligned sentence pairs. Wordalignment is performed by matching parallel phrases to new sentence pairs, andadding word links from the parallel phrase to words in the matching sentencesegment. Experiments on an English--Swedish parallel corpus showed that theheuristic phrase-based method produced word alignments with high precision butlow recall. In order to improve alignment recall, phrases were generalized byreplacing words with part-of-speech categories. The generalization improvedrecall but at the expense of precision. Two filtering strategies wereinvestigated to prune the large set of generalized phrases. Finally, thephrase-based method was compared to statistical word alignment with Giza++ andwe found that although statistical alignments based on large datasets willoutperform phrase-based word alignment, a combination of phrase-based andstatistical word alignment outperformed pure statistical alignment in terms ofAlignment Error Rate (AER).
Language Multilinguality
Topics Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), Multilinguality
Full paper Heuristic Word Alignment with Parallel Phrases
Bibtex @InProceedings{HOLMQVIST10.508,
  author = {Maria Holmqvist},
  title = {Heuristic Word Alignment with Parallel Phrases},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA