Summary of the paper

Title Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
Authors Kwanchiva Saykham, Ananlada Chotimongkol and Chai Wutiwiwatchai
Abstract This paper investigates the effectiveness of online temporal language modeladaptation when applied to a Thai broadcast news transcription task. Ouradaptation scheme works as follow: first an initial language model is trainedwith broadcast news transcription available during the development period. Thenthe language model is adapted over time with more recent broadcast newstranscription and online news articles available during deployment especiallythe data from the same time period as the broadcast news speech beingrecognized. We found that the data that are closer in time are more similar interms of perplexity and are more suitable for language model adaptation. TheLMs that are adapted over time with more recent news data are better, both interms of perplexity and WER, than the static LM trained from only the initialset of broadcast news data. Adaptation data from broadcast news transcriptionimproved perplexity by 38.3% and WER by 7.1% relatively. Though, online newsarticles achieved less improvement, it is still a useful resource as it can beobtained automatically. Better data pre-processing techniques and dataselection techniques based on text similarity could be applied to the newsarticles to obtain further improvement from this promising result.
Language Tools, systems, applications
Topics Language modelling, Speech Recognition/Understanding, Tools, systems, applications
Full paper Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
Bibtex @InProceedings{SAYKHAM10.249,
  author = {Kwanchiva Saykham, Ananlada Chotimongkol and Chai Wutiwiwatchai},
  title = {Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA