Summary of the paper

Title A Game-based Approach to Transcribing Images of Text
Authors Khalil Dahab and Anja Belz
Abstract Creating language resources is expensive and time-consuming, and this forms abottleneck in the development of language technology, for less-studiednon-European languages in particular. The recent internet phenomenon ofcrowd-sourcing offers a cost-effective and potentially fast way of overcomingsuch language resource acquisition bottlenecks. We present a methodology thattakes as its input scanned documents of typed or hand-written text, andproduces transcriptions of the text as its output. Instead of using OpticalCharacter Recognition (OCR) technology, the methodology is game-based andproduces such transcriptions as a by-product. The approach is intendedparticularly for languages for which language technology and resources arescarce and reliable OCR technology may not exist. It can be used in place ofOCR for transcribing individual documents, or to create corpora of pairedimages and transcriptions required to train OCR tools. We present Minefield, aprototype implementation of the approach which is currently collecting Arabictranscriptions.
Language Tools, systems, applications
Topics Corpus (creation, annotation, etc.), LR Infrastructures and Architectures, Tools, systems, applications
Full paper A Game-based Approach to Transcribing Images of Text
Bibtex @InProceedings{DAHAB10.476,
  author = {Khalil Dahab and Anja Belz},
  title = {A Game-based Approach to Transcribing Images of Text},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA