Summary of the paper

Title Building a Bilingual ValLex Using Treebank Token Alignment: First Observations
Authors Jana Šindlerová and Ondřej Bojar
Abstract We explore the potential and limitations of a concept ofbuilding a bilingual valency lexicon based on the alignment of nodes in aparallel treebank. Our aim is to build an electronicCzech->English ValencyLexicon by collecting equivalences from bilingual treebank data and storingthemin two already existing electronic valency lexicons, PDT-VALLEX and Engvallex.For this task a special annotation interface has been built upon the TrEdeditor, allowing quick and easy collecting of frame equivalences in either ofthe source lexicons.The issues encountered so far includelimitations of technical character, theory-dependent limitations andlimitationsconcerning the achievable degree of quality of human annotation. The issues ofspecialinterest for both linguists and MT specialists involved in the project includelinguistically motivated non-balance between the frame equivalents, either innumberor in type of valency participants.The first phases of annotation so far attest the assumption that there isa unique correspondence between the functors of the translation-equivalentframes. Also, hardly any linguistically significant non-balance between theframes has been found, which is partly promising considering the linguistictheory used and partly caused by little stylistic variety of the annotatedcorpus texts.
Language Validation of LRs
Topics Corpus (creation, annotation, etc.), Lexicon, lexical database, Validation of LRs
Full paper Building a Bilingual ValLex Using Treebank Token Alignment: First Observations
Bibtex @InProceedings{INDLEROV10.568,
  author = {Jana Šindlerová and Ondřej Bojar},
  title = {Building a Bilingual ValLex Using Treebank Token Alignment: First Observations},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA