Title |
Enriching Word Alignment with Linguistic Tags |
Authors |
Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel and Kazuaki Maeda |
Abstract |
Incorporating linguistic knowledge into word alignment is becoming increasinglyimportant for current approaches in statistical machine translation research.To improve automatic word alignment and ultimately machine translation quality,an annotation framework is jointly proposed by LDC (Linguistic Data Consortium)and IBM. The framework enriches word alignment corpora to capture contextual,syntactic and language-specific features by introducing linguistic tags to thealignment annotation. Two annotation schemes constitute the framework:alignment and tagging. The alignment scheme aims to identify minimumtranslation units and translation relations by using minimum-match andattachment annotation approaches. A set of word tags and alignment link tagsare designed in the tagging scheme to describe these translation units andrelations. The framework produces a solid ground-level alignment base uponwhich larger translation unit alignment can be automatically induced. To testthe soundness of this work, evaluation is performed on a pilot annotation,resulting in inter- and intra- annotator agreement of above 90%. To date LDChas produced manual word alignment and tagging on 32,823 Chinese-Englishsentences following this framework. |
Language |
Parsing |
Topics |
Corpus (creation, annotation, etc.), Machine Translation, SpeechToSpeech Translation, Parsing |
Full paper  |
Enriching Word Alignment with Linguistic Tags |
Bibtex |
@InProceedings{LI10.670,
author = {Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel and Kazuaki Maeda}, title = {Enriching Word Alignment with Linguistic Tags}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |