Summary of the paper

Title PDTB XML: the XMLization of the Penn Discourse TreeBank 2.0
Authors Xuchen Yao, Irina Borisova and Mehwish Alam
Abstract The current study presents a conversion and unification of the Penn DiscourseTreeBank 2.0 (PDTB) and the Penn TreeBank (PTB) under XML format. The main goalof the PDTB XML is to create a tool for efficient and broad querying of thesyntax and discourse information simultaneously. The key stages of the projectare developing proper cross-references between different data types and theirrepresentation in the modified TIGER-XML format, and then writing the requireddeclarative languages (XML Schema). PTB XML is compatible with TIGER-XMLformat. The PDTB XML is developed as a unified format for the convenience ofXQuery users; it integrates discourse relations and XML structures into oneunified hierarchy and builds the cross references between the syntactic treesand the discourse relations. The syntactic and discourse elements are assignedwith unique IDs in order to build cross-references between them. The convertedcorpus allows for a simultaneous search for syntactically specified discourseinformation based on the XQuery standard, which is illustrated with a simpleexample in the article.
Language Tools, systems, applications
Topics Corpus (creation, annotation, etc.), Discourse annotation, representation and processing, Tools, systems, applications
Full paper PDTB XML: the XMLization of the Penn Discourse TreeBank 2.0
Bibtex @InProceedings{YAO10.336,
  author = {Xuchen Yao, Irina Borisova and Mehwish Alam},
  title = {PDTB XML: the XMLization of the Penn Discourse TreeBank 2.0},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA