Summary of the paper

Title English-Hindi Transliteration using Multiple Similarity Metrics
Authors Niraj Aswani and Robert Gaizauskas
Abstract In this paper, we present an approach to measure the transliteration similarityof English-Hindi word pairs. Our approach has two components. First we proposea bi-directional mapping between one or more characters in the Devanagariscript and one or more characters in the Roman script (pronounced as inEnglish). This allows a given Hindi word written in Devanagari to betransliterated into the Roman script and vice-versa. Second, we present analgorithm for computing a similarity measure that is a variant of Dice’scoefficient measure and the LCSR measure and which also takes into account theconstraints needed to match English-Hindi transliterated words. Finally, byevaluating various similarity metrics individually and together under amultiple measure agreement scenario, we show that it is possible to achieve a0.92 f-measure in identifying English-Hindi word pairs that aretransliterations. In order to assess the portability of our approach to othersimilar languages we adapt our system to the Gujarati language.
Language Tools, systems, applications
Topics Phonetic Databases, Phonology, Machine Translation, SpeechToSpeech Translation, Tools, systems, applications
Full paper English-Hindi Transliteration using Multiple Similarity Metrics
Bibtex @InProceedings{ASWANI10.694,
  author = {Niraj Aswani and Robert Gaizauskas},
  title = {English-Hindi Transliteration using Multiple Similarity Metrics},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA