Summary of the paper

Title Domain-related Annotation of Polish Spoken Dialogue Corpus LUNA.PL
Authors Agnieszka Mykowiecka, Katarzyna Głowińska and Joanna Rabiega-Wiśniewska
Abstract The paper presents a corpus of Polish spoken dialogues annotated on severallevels, from transcription of dialogues and their morphosyntactic analysis, to semantic annotation. The LUNA.PL corpus is the first semantically annotated corpus of Polishspontaneous speech. It contains 500 dialogues recorded at the Warsaw Transport Authority call centre. For each dialogue,the corpus contains recorded audio signal, its transcription and five XML files with annotations on subsequent levels.Speech transcription was done manually. Text annotation was constructed using a combination of rule based programmesand computer-aided manual work. For morphological annotation we used the already existing analyzer andmanually disambiguated the results. Morphologically annotated texts of dialogues were automatically segmented intoelementary syntactic chunks. Semantic annotation was done by a set of specially designed rules and thenmanually corrected. The paper describes details of the domain related semantic annotationwhich consists of two levels - concept level at which around 200 attributes and their values are annotated,and predicate level at which 47 frame types are recognized. We describe the domain model accepted, and thestatistics over the entire annotated set of dialogues.
Language Semantics
Topics Dialogue, Speech Recognition/Understanding, Semantics
Full paper Domain-related Annotation of Polish Spoken Dialogue Corpus LUNA.PL
Bibtex @InProceedings{MYKOWIECKA10.337,
  author = {Agnieszka Mykowiecka, Katarzyna Głowińska and Joanna Rabiega-Wiśniewska},
  title = {Domain-related Annotation of Polish Spoken Dialogue Corpus LUNA.PL},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA