Title |
Querying Diverse Treebanks in a Uniform Way |
Authors |
Jan Štěpánek and Petr Pajas |
Abstract |
This paper presents a system for querying treebanks in auniform way. The system is able to work with both dependency andconstituency based treebanks in any language. We demonstrate itsabilities on 11 different treebanks. The query language used by thesystem provides many features not available in other existing systemswhile still keeping the performance efficient. The paper alsodescribes the conversion of ten treebanks into a commonXML-based format used by the system, touching the question ofstandards and formats. The paper then shows several examples oflinguistically interesting questions that the system is able toanswer, for example browsing verbal clauses without subjects orextraposed relative clauses, generating the underlying grammar in aconstituency treebank, searching for non-projective edges in adependency treebank, or word-order typology of a language based on thetreebank. The performance of several implementations of the system isalso discussed by measuring the time requirements of some of thequeries. |
Language |
LR Infrastructures and Architectures |
Topics |
Tools, systems, applications, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures |
Full paper  |
Querying Diverse Treebanks in a Uniform Way |
Bibtex |
@InProceedings{TPNEK10.381,
author = {Jan Štěpánek and Petr Pajas}, title = {Querying Diverse Treebanks in a Uniform Way}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |