LREC 2010 Proceedings

Summary of the paper

Title	A Speech Corpus for Modeling Language Acquisition: CAREGIVER
Authors	Toomas Altosaar, Louis ten Bosch, Guillaume Aimetti, Christos Koniaris, Kris Demuynck and Henk van den Heuvel
Abstract	A multi-lingual speech corpus used for modeling language acquisition calledCAREGIVER has been designed and recorded within the framework of the EU fundedAcquisition of Communication and Recognition Skills (ACORNS) project. The paperdescribes the motivation behind the corpus and its design by relying on currentknowledge regarding infant language acquisition. Instead of recording infantsand children, the voices of their primary and secondary caregivers werecaptured in both infant-directed and adult-directed speech modes over fourlanguages in a read speech manner. The challenges and methods applied to obtainsimilar prompts in terms of complexity and semantics across differentlanguages, as well as the normalized recording procedures employed at differentlocations, is covered. The corpus contains nearly 66000 utterance based audiofiles spoken over a two-year period by 17 male and 17 female native speakers ofDutch, English, Finnish, and Swedish. An orthographical transcription isavailable for every utterance. Also, time-aligned word and phone annotationsfor many of the sub-corpora also exist. The CAREGIVER corpus will be publishedvia ELRA.
Language	Cognitive methods
Topics	Speech resource/database, Acquisition, Cognitive methods
Full paper	A Speech Corpus for Modeling Language Acquisition: CAREGIVER
Bibtex	@InProceedings{ALTOSAAR10.597, author = {Toomas Altosaar, Louis ten Bosch, Guillaume Aimetti, Christos Koniaris, Kris Demuynck and Henk van den Heuvel}, title = {A Speech Corpus for Modeling Language Acquisition: CAREGIVER}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }