Title |
Enhanced Infrastructure for Creation and Collection of Translation Resources |
Authors |
Zhiyi Song, Stephanie Strassel, Gary Krug and Kazuaki Maeda |
Abstract |
Statistical Machine Translation (MT) systems have achieved impressive resultsin recent years, due in large part to the increasing availability of paralleltext for system training and development. This paper describes recent effortsat Linguistic Data Consortium to create linguistic resources for MT, includingcorpora, specifications and resource infrastructure. We review LDC'sthree-pronged ap-proach to parallel text corpus development (acquisition ofexisting parallel text from known repositories, harvesting and aligning ofpotential parallel documents from the web, and manual creation of parallel textby professional translators), and describe recent adap-tations that haveenabled significant expansions in the scope, variety, quality, efficiency andcost-effectiveness of translation resource creation at LDC. |
Language |
LR Infrastructures and Architectures |
Topics |
Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures |
Full paper  |
Enhanced Infrastructure for Creation and Collection of Translation Resources |
Bibtex |
@InProceedings{SONG10.798,
author = {Zhiyi Song, Stephanie Strassel, Gary Krug and Kazuaki Maeda}, title = {Enhanced Infrastructure for Creation and Collection of Translation Resources}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |