Summary of the paper

Title Resource Creation for Training and Testing of Transliteration Systems for Indian Languages
Authors Sowmya V. B., Monojit Choudhury, Kalika Bali, Tirthankar Dasgupta and Anupam Basu
Abstract Machine transliteration is used in a number of NLP applications ranging frommachine translation and information retrieval to input mechanisms for non-romanscripts. Many popular Input Method Editors for Indian languages, like Baraha,Akshara, Quillpad etc, use back-transliteration as a mechanism to allow usersto input text in a number of Indian language. The lack of a standard dataset toevaluate these systems makes it difficult to make any meaningful comparisons oftheir relative accuracies. In this paper, we describe the methodology for thecreation of a dataset of ~2500 transliterated sentence pairs each in Bangla,Hindi and Telugu. The data was collected across three different modes from atotal of 60 users. We believe that this dataset will prove useful not only forthe evaluation and training of back-transliteration systems but also help inthe linguistic analysis of the process of transliterating Indian languages fromnative scripts to Roman.
Language
Topics Corpus (creation, annotation, etc.), Other
Full paper Resource Creation for Training and Testing of Transliteration Systems for Indian Languages
Bibtex @InProceedings{VB10.182,
  author = {Sowmya V. B., Monojit Choudhury, Kalika Bali, Tirthankar Dasgupta and Anupam Basu},
  title = {Resource Creation for Training and Testing of Transliteration Systems for Indian Languages},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA