Summary of the paper

Title Lexicon Design for Transcription of Spontaneous Voice Messages
Authors Michal Gishri, Vered Silber-Varod and Ami Moyal
Abstract Building a comprehensive pronunciation lexicon is a crucial element in thesuccess of any speech recognition engine. The first stage of lexicon designinvolves the compilation of a comprehensive word list that keeps theOut-Of-Vocabulary (OOV) word rate to a minimum. The second stage involvesproviding optimized phonemic representations for all lexical items on the list.The research presented here focuses on the first stage of lexicon design ―word list compilation, and describes the methodologies employed in thecollection of a pronunciation lexicon designed for the purpose of AmericanEnglish voice message transcription using speech recognition. The lexicondesign used is based on a topic domain structure with a target of 90% wordcoverage for each domain. This differs somewhat from standard approaches whereprobable words from textual corpora are extracted. This paper raises fourissues involved in lexicon design for the transcription of spontaneous voicemessages: the inclusion of interjections and other characteristics common tospontaneous speech; the identification of unique messaging terminology; therelative ratio of proper nouns to common words; and the overall size of thelexicon.
Language
Topics Lexicon, lexical database, Speech Recognition/Understanding
Full paper Lexicon Design for Transcription of Spontaneous Voice Messages
Bibtex @InProceedings{GISHRI10.953,
  author = {Michal Gishri, Vered Silber-Varod and Ami Moyal},
  title = {Lexicon Design for Transcription of Spontaneous Voice Messages},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA