LREC 2010 Proceedings

Summary of the paper

Title	A Fully Annotated Corpus of Russian Speech
Authors	Pavel Skrelin, Nina Volskaya, Daniil Kocharov, Karina Evgrafova, Olga Glotova and Vera Evdokimova
Abstract	The paper introduces CORPRES ― a fully annotated Russian speech corpusdeveloped at the Department of Phonetics, St. Petersburg State University as aresult of a three-year project. The corpus includes samples of differentspeaking styles produced by 4 male and 4 female speakers. Six levels ofannotation cover all phonetic and prosodic information about the recordedspeech data, including labels for pitch marks, phonetic events, narrow and widephonetic transcription, orthographic and prosodic transcription. Precisephonetic transcription of the data provides an especially valuable resource forboth research and development purposes. Overall corpus size is 528 458 runningwords and contains 60 hours of speech made up of 7.5 hours from each speaker.40% of the corpus was manually segmented and fully annotated on all six levels.60% of the corpus was partly annotated; there are labels for pitch period andphonetic event labels. Orthographic, prosodic and ideal phonetic transcriptionfor this part was generated and stored as text files. The fully annotated partof the corpus covers all speaking styles included in the corpus and allspeakers. The paper contains information about CORPRES design and annotationprinciples, overall data description and some speculation about possible use ofthe corpus.
Language	Speech Synthesis
Topics	Corpus (creation, annotation, etc.), Phonetic Databases, Phonology, Speech Synthesis
Full paper	A Fully Annotated Corpus of Russian Speech
Bibtex	@InProceedings{SKRELIN10.274, author = {Pavel Skrelin, Nina Volskaya, Daniil Kocharov, Karina Evgrafova, Olga Glotova and Vera Evdokimova}, title = {A Fully Annotated Corpus of Russian Speech}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }