Summary of the paper

Title Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
Authors Hercules Dalianis, Hao-chun Xing and Xin Zhang
Abstract This paper first describes an experiment to construct an English-Chineseparallel corpus, then applying the Uplug word alignment tool on the corpus andfinally produce and evaluate an English-Chinese word list. The StockholmEnglish-Chinese Parallel Corpus (SEC) was created by downloadingEnglish-Chinese parallel corpora from a Chinese web site containing law textsthat have been manually translated from Chinese to English. The parallel corpuscontains 104 563 Chinese characters equivalent to 59 918 Chinese words, and thecorresponding English corpus contains 75 766 English words. However Chinesewriting does not utilize any delimiters to mark word boundaries so we had tocarry out word segmentation as a preprocessing step on the Chinese corpus.Moreover since the parallel corpus is downloaded from Internet the corpus isnoisy regarding to alignment between corresponding translated sentences.Therefore we used 60 hours of manually work to align the sentences in theEnglish and Chinese parallel corpus before performing automatic word alignmentusing Uplug. The word alignment with Uplug was carried out from English toChinese. Nine respondents evaluated the resulting English-Chinese word listwith frequency equal to or above three and we obtained an accuracy of 73.1percent.
Language Multilinguality
Topics Corpus (creation, annotation, etc.), Lexicon, lexical database, Multilinguality
Full paper Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
Bibtex @InProceedings{DALIANIS10.13,
  author = {Hercules Dalianis, Hao-chun Xing and Xin Zhang},
  title = {Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA