Summary of the paper

Title Vergina: A Modern Greek Speech Database for Speech Synthesis
Authors Alexandros Lazaridis, Theodoros Kostoulas, Todor Ganchev, Iosif Mporas and Nikos Fakotakis
Abstract The present paper outlines the Vergina speech database, which was developed insupport of research and development of corpus-based unit selection andstatistical parametric speech synthesis systems for Modern Greek language. Inthe following, we describe the design, development and implementation of therecording campaign, as well as the annotation of the database. Specifically, atext corpus of approximately 5 million words, collected from newspaperarticles, periodicals, and paragraphs of literature, was processed in order toselect the utterances-sentences needed for producing the speech database and toachieve a reasonable phonetic coverage. The broad coverage and contents of theselected utterances-sentences of the database ― text corpus collected fromdifferent domains and writing styles ― makes this database appropriate forvarious application domains. The database, recorded in audio studio, consistsof approximately 3,000 phonetically balanced Modern Greek utterancescorresponding to approximately four hours of speech. Annotation of the Verginaspeech database was performed using task-specific tools, which are based on ahidden Markov model (HMM) segmentation method, and then manual inspection andcorrections were performed.
Language Speech resource/database
Topics Corpus (creation, annotation, etc.), Speech Synthesis, Speech resource/database
Full paper Vergina: A Modern Greek Speech Database for Speech Synthesis
Bibtex @InProceedings{LAZARIDIS10.614,
  author = {Alexandros Lazaridis, Theodoros Kostoulas, Todor Ganchev, Iosif Mporas and Nikos Fakotakis},
  title = {Vergina: A Modern Greek Speech Database for Speech Synthesis},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA