Title |
Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation |
Authors |
Masaki Murata, Tomohiro Ohno, Shigeki Matsubara and Yasuyoshi Inagaki |
Abstract |
With the development of speech and language processing, speech translationsystems have been developed. These studies target spoken dialogues, and employconsecutive interpretation, which uses a sentence as the translation unit. Onthe other hand, there exist a few researches about simultaneous interpreting,and recently, the language resources for promoting simultaneous interpretingresearch, such as the publication of an analytical large-scale corpus, has beenprepared. For the future, it is necessary to make the corpora more practicaltoward realization of a simultaneous interpreting system. In this paper, wedescribe the construction of a bilingual corpus which can be used forsimultaneous lecture interpreting research. Simultaneous lecture interpretingsystems are required to recognize translation units in the middle of asentence, and generate its translation at the proper timing. We constructed thebilingual lecture corpus by the following steps. First, we segmented sentencesin the lecture data into semantically meaningful units for the simultaneousinterpreting.And then, we assigned the translations to these units from the viewpoint of thesimultaneous interpreting. In addition, we investigated the possibility ofautomatically detecting the simultaneous interpreting timing from our corpus. |
Language |
Speech resource/database |
Topics |
Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), Speech resource/database |
Full paper  |
Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation |
Bibtex |
@InProceedings{MURATA10.581,
author = {Masaki Murata, Tomohiro Ohno, Shigeki Matsubara and Yasuyoshi Inagaki}, title = {Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |