Summary of the paper

Title A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards
Authors Ulrich Heid, Helmut Schmid, Kerstin Eckart and Erhard Hinrichs
Abstract In the framework of the preparation of linguistic web services for corpusprocessing, the need for a representation format was felt, which supportsinteroperability between different web services in a corpus processingpipeline, but also provides a well-defined interface to both, legacy tools andtheir data formats and upcoming international standards. We present the D-SPINtext corpus format, TCF, which was designed for this purpose. It is a stand-offXML format, inspired by the philosophy of the emerging standards LAF(Linguistic Annotation Framework) and its ``instances'' MAF formorpho-syntactic annotation and SynAF for syntactic annotation. Tools for theexchange with existing (best practice) formats are available, and a converterfrom MAF to TCF is being tested in spring 2010. We describe the usage scenariowhere TCF is embedded and the properties and architecture of TCF. We also giveexamples of TCF encoded data and describe the aspects of syntactic and semanticinteroperability already addressed.
Language Standards for LRs
Topics Web Services, LR Infrastructures and Architectures, Standards for LRs
Full paper A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards
Bibtex @InProceedings{HEID10.503,
  author = {Ulrich Heid, Helmut Schmid, Kerstin Eckart and Erhard Hinrichs},
  title = {A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA