Summary of the paper

Title KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems
Authors Luis Javier Rodríguez-Fuentes, Mikel Penagarikano, Germán Bordel, Amparo Varona and Mireia Díez
Abstract A speech database, named KALAKA, was created to support the Albayzin 2008Evaluation of Language Recognition Systems, organized by the Spanish Network onSpeech Technologies from May to November 2008. This evaluation, designedaccording to the criteria and methodology applied in the NIST LanguageRecognition Evaluations, involved four target languages: Basque, Catalan,Galician and Spanish (official languages in Spain), and included speech signalsin other (unknown) languages to allow open-set verification trials. In thispaper, the process of designing, collecting data and building the train,development and evaluation datasets of KALAKA is described. Results attained inthe Albayzin 2008 LRE are presented as a means of evaluating the database. Theperformance of a state-of-the-art language recognition system on a closed-setevaluation task is also presented for reference. Future work includes extendingKALAKA by adding Portuguese and English as target languages and renewing theset of unknown languages needed to carry out open-set evaluations.
Language Corpus (creation, annotation, etc.)
Topics Speech resource/database, Language Identification, Corpus (creation, annotation, etc.)
Full paper KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems
Bibtex @InProceedings{RODRGUEZFUENTES10.394,
  author = {Luis Javier Rodríguez-Fuentes, Mikel Penagarikano, Germán Bordel, Amparo Varona and Mireia Díez},
  title = {KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA