Title |
Building a Bilingual ValLex Using Treebank Token Alignment: First Observations |
Authors |
Jana Šindlerová and Ondřej Bojar |
Abstract |
We explore the potential and limitations of a concept ofbuilding a bilingual valency lexicon based on the alignment of nodes in aparallel treebank. Our aim is to build an electronicCzech->English ValencyLexicon by collecting equivalences from bilingual treebank data and storingthemin two already existing electronic valency lexicons, PDT-VALLEX and Engvallex.For this task a special annotation interface has been built upon the TrEdeditor, allowing quick and easy collecting of frame equivalences in either ofthe source lexicons.The issues encountered so far includelimitations of technical character, theory-dependent limitations andlimitationsconcerning the achievable degree of quality of human annotation. The issues ofspecialinterest for both linguists and MT specialists involved in the project includelinguistically motivated non-balance between the frame equivalents, either innumberor in type of valency participants.The first phases of annotation so far attest the assumption that there isa unique correspondence between the functors of the translation-equivalentframes. Also, hardly any linguistically significant non-balance between theframes has been found, which is partly promising considering the linguistictheory used and partly caused by little stylistic variety of the annotatedcorpus texts. |
Language |
Validation of LRs |
Topics |
Corpus (creation, annotation, etc.), Lexicon, lexical database, Validation of LRs |
Full paper  |
Building a Bilingual ValLex Using Treebank Token Alignment: First Observations |
Bibtex |
@InProceedings{INDLEROV10.568,
author = {Jana Šindlerová and Ondřej Bojar}, title = {Building a Bilingual ValLex Using Treebank Token Alignment: First Observations}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |