Summary of the paper

Title DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain
Authors Doris Baum, Daniel Schneider, Rolf Bardeli, Jochen Schwenninger, Barbara Samlowski, Thomas Winkler and Joachim Köhler
Abstract Typical broadcast material contains not only studio-recorded texts read bytrained speakers, but also spontaneous and dialect speech, debates withcross-talk, voice-overs, and on-site reports with difficult acousticenvironments. Standard approaches to speech and speaker recognition usuallydeteriorate under such conditions. This paper reports on the design,construction, and experimental analysis of DiSCo, a German corpus for theevaluation of speech and speaker recognition on challenging material from thebroadcast domain. One of the key requirements for the design of this corpus wasa good coverage of different types of serious programmes beyond clean speechand planned speech broadcast news. Corpus annotation encompasses manualsegmentation, an orthographic transcription, and labelling with speech mode,dialect, and noise type. We indicate typical use cases for the corpus byreporting results from ASR, speech search, and speaker recognition on the newcorpus, thereby obtaining insights into the difficulty of audio recognition onthe various classes.
Language Speech resource/database
Topics Corpus (creation, annotation, etc.), Speech Recognition/Understanding, Speech resource/database
Full paper DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain
Bibtex @InProceedings{BAUM10.355,
  author = {Doris Baum, Daniel Schneider, Rolf Bardeli, Jochen Schwenninger, Barbara Samlowski, Thomas Winkler and Joachim Köhler},
  title = {DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA