Summary of the paper

Title Querying Diverse Treebanks in a Uniform Way
Authors Jan Štěpánek and Petr Pajas
Abstract This paper presents a system for querying treebanks in auniform way. The system is able to work with both dependency andconstituency based treebanks in any language. We demonstrate itsabilities on 11 different treebanks. The query language used by thesystem provides many features not available in other existing systemswhile still keeping the performance efficient. The paper alsodescribes the conversion of ten treebanks into a commonXML-based format used by the system, touching the question ofstandards and formats. The paper then shows several examples oflinguistically interesting questions that the system is able toanswer, for example browsing verbal clauses without subjects orextraposed relative clauses, generating the underlying grammar in aconstituency treebank, searching for non-projective edges in adependency treebank, or word-order typology of a language based on thetreebank. The performance of several implementations of the system isalso discussed by measuring the time requirements of some of thequeries.
Language LR Infrastructures and Architectures
Topics Tools, systems, applications, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures
Full paper Querying Diverse Treebanks in a Uniform Way
Bibtex @InProceedings{TPNEK10.381,
  author = {Jan Štěpánek and Petr Pajas},
  title = {Querying Diverse Treebanks in a Uniform Way},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA