Summary of the paper

Title Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus
Authors Lars Ahrenberg
Abstract This paper profiles the Europarl part of an English-Swedish parallel corpus andcompares it with three other subcorpora of the same parallel corpus. We firstdescribe our method for comparison which is based on manually reviewed wordalignments. We investigate relative frequences of different types ofcorrespondence, including null alignments, many-to-one correspondences andcrossings. In addition, both halves of the parallel corpus have been annotatedwith morpho-syntactic information. The syntactic annotation uses labelleddependency relations. Thus, we can see how different types of correspondencesare distributed on different parts-of-speech and compute correspondences at thestructural level. In spite of the fact that two of the other subcorporacontains fiction, it is found that the Europarl part is the one having thehighest proportion of many types of restructurings, including additions,deletions, long distance reorderings and dependency reversals. We explain thisby the fact that the majority of Europarl segments are parallel translationsrather than source texts and their translations.
Language Profiling
Topics Corpus (creation, annotation, etc.), Machine Translation, SpeechToSpeech Translation, Profiling
Full paper Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus
Bibtex @InProceedings{AHRENBERG10.193,
  author = {Lars Ahrenberg},
  title = {Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA