LREC 2010 Proceedings

Summary of the paper

Title	Enhanced Infrastructure for Creation and Collection of Translation Resources
Authors	Zhiyi Song, Stephanie Strassel, Gary Krug and Kazuaki Maeda
Abstract	Statistical Machine Translation (MT) systems have achieved impressive resultsin recent years, due in large part to the increasing availability of paralleltext for system training and development. This paper describes recent effortsat Linguistic Data Consortium to create linguistic resources for MT, includingcorpora, specifications and resource infrastructure. We review LDC'sthree-pronged ap-proach to parallel text corpus development (acquisition ofexisting parallel text from known repositories, harvesting and aligning ofpotential parallel documents from the web, and manual creation of parallel textby professional translators), and describe recent adap-tations that haveenabled significant expansions in the scope, variety, quality, efficiency andcost-effectiveness of translation resource creation at LDC.
Language	LR Infrastructures and Architectures
Topics	Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures
Full paper	Enhanced Infrastructure for Creation and Collection of Translation Resources
Bibtex	@InProceedings{SONG10.798, author = {Zhiyi Song, Stephanie Strassel, Gary Krug and Kazuaki Maeda}, title = {Enhanced Infrastructure for Creation and Collection of Translation Resources}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }