Summary of the paper

Title Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues
Authors Svetlana Stoyanchev and Paul Piwek
Abstract We describe the construction of the CODA corpus, a parallel corpus ofmonologues and expository dialogues. The dialogue part of the corpus consistsof expository, i.e., information-delivering rather than dramatic, dialogueswritten by several acclaimed authors. The monologue part of the corpus is a paraphrase in monologue form of thesedialogues by a human annotator. The annotator-written monologue preserves all information present in theoriginal dialogue and does not introduce any new information that is notpresent in the original dialogue.The corpus was constructed as a resource for extracting rules for automatedgeneration of dialogue from monologue.Using authored dialogues allows us to analyse the techniques used byaccomplished writers for presenting information in the form of dialogue. Thedialogues are annotated with dialogue acts and the monologues with rhetoricalstructure. We developed annotation and translation guidelines together with acustom-developed tool for carrying out translation, alignment and annotation ofthe dialogues.The final parallel CODA corpus consists of 1000 dialogue turns that are taggedwith dialogue acts and aligned with monologue that expresses the sameinformation and has been annotated with rhetorical structure relations.
Language Natural Language Generation
Topics Corpus (creation, annotation, etc.), Dialogue, Natural Language Generation
Full paper Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues
Bibtex @InProceedings{STOYANCHEV10.127,
  author = {Svetlana Stoyanchev and Paul Piwek},
  title = {Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA