Summary of the paper

Title An Annotation Scheme and Gold Standard for Dutch-English Word Alignment
Authors Lieve Macken
Abstract The importance of sentence-aligned parallel corpora has been widelyacknowledged. Reference corpora in which sub-sentential translationalcorrespondences are indicated manually are more labour-intensive to create, andhence less wide-spread. Such manually created reference alignments -- alsocalled Gold Standards -- have been used in research projects to develop or testautomatic word alignment systems. In most translations, translational correspondences are rather complex; forexample word-by-word correspondences can be found only for a limited number ofwords. A reference corpus in which those complex translational correspondencesare aligned manually is therefore also a useful resource for the development oftranslation tools and for translation studies.In this paper, we describe how we created a Gold Standard for the Dutch-Englishlanguage pair. We present the annotation scheme, annotation guidelines,annotation tool and inter-annotator results. To cover a wide range of syntacticand stylistic phenomena that emerge from different writing and translationstyles, our Gold Standard data set contains texts from different text types.The Gold Standard will be publicly available as part of the Dutch ParallelCorpus.
Language Machine Translation, SpeechToSpeech Translation
Topics Corpus (creation, annotation, etc.), Multilinguality, Machine Translation, SpeechToSpeech Translation
Full paper An Annotation Scheme and Gold Standard for Dutch-English Word Alignment
Bibtex @InProceedings{MACKEN10.100,
  author = {Lieve Macken},
  title = {An Annotation Scheme and Gold Standard for Dutch-English Word Alignment},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA