LREC 2010 Proceedings

Summary of the paper

Title	Vergina: A Modern Greek Speech Database for Speech Synthesis
Authors	Alexandros Lazaridis, Theodoros Kostoulas, Todor Ganchev, Iosif Mporas and Nikos Fakotakis
Abstract	The present paper outlines the Vergina speech database, which was developed insupport of research and development of corpus-based unit selection andstatistical parametric speech synthesis systems for Modern Greek language. Inthe following, we describe the design, development and implementation of therecording campaign, as well as the annotation of the database. Specifically, atext corpus of approximately 5 million words, collected from newspaperarticles, periodicals, and paragraphs of literature, was processed in order toselect the utterances-sentences needed for producing the speech database and toachieve a reasonable phonetic coverage. The broad coverage and contents of theselected utterances-sentences of the database ― text corpus collected fromdifferent domains and writing styles ― makes this database appropriate forvarious application domains. The database, recorded in audio studio, consistsof approximately 3,000 phonetically balanced Modern Greek utterancescorresponding to approximately four hours of speech. Annotation of the Verginaspeech database was performed using task-specific tools, which are based on ahidden Markov model (HMM) segmentation method, and then manual inspection andcorrections were performed.
Language	Speech resource/database
Topics	Corpus (creation, annotation, etc.), Speech Synthesis, Speech resource/database
Full paper	Vergina: A Modern Greek Speech Database for Speech Synthesis
Bibtex	@InProceedings{LAZARIDIS10.614, author = {Alexandros Lazaridis, Theodoros Kostoulas, Todor Ganchev, Iosif Mporas and Nikos Fakotakis}, title = {Vergina: A Modern Greek Speech Database for Speech Synthesis}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }