LREC 2010 Proceedings

Summary of the paper

Title	Language Modeling Approach for Retrieving Passages in Lecture Audio Data
Authors	Koichiro Honda and Tomoyosi Akiba
Abstract	Spoken Document Retrieval (SDR) is a promising technology for enhancing theutility of spoken materials. After the spoken documents have been transcribedby using a Large Vocabulary Continuous Speech Recognition (LVCSR) decoder, atext-based ad hoc retrieval method can be applied directlyto the transcribed documents. However, recognition errors will significantlydegrade the retrieval performance. To address this problem, we have previouslyproposed a method that aimed to fill the gap between automatically transcribedtext and correctly transcribed text by using a statistical translationtechnique. In this paper, we extend the method by (1) using neighboring contextto index the target passage, and (2) applying a language modeling approach fordocument retrieval. Our experimental evaluationshows that context information can improve retrieval performance, and that thelanguage modeling approach is effective in incorporating context informationinto the proposed SDR method, which uses a translation model.
Language	Machine Translation, SpeechToSpeech Translation
Topics	Speech resource/database, Information Extraction, Information Retrieval, Machine Translation, SpeechToSpeech Translation
Full paper	Language Modeling Approach for Retrieving Passages in Lecture Audio Data
Bibtex	@InProceedings{HONDA10.462, author = {Koichiro Honda and Tomoyosi Akiba}, title = {Language Modeling Approach for Retrieving Passages in Lecture Audio Data}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }