Title |
English-Hindi Transliteration using Multiple Similarity Metrics |
Authors |
Niraj Aswani and Robert Gaizauskas |
Abstract |
In this paper, we present an approach to measure the transliteration similarityof English-Hindi word pairs. Our approach has two components. First we proposea bi-directional mapping between one or more characters in the Devanagariscript and one or more characters in the Roman script (pronounced as inEnglish). This allows a given Hindi word written in Devanagari to betransliterated into the Roman script and vice-versa. Second, we present analgorithm for computing a similarity measure that is a variant of Dicescoefficient measure and the LCSR measure and which also takes into account theconstraints needed to match English-Hindi transliterated words. Finally, byevaluating various similarity metrics individually and together under amultiple measure agreement scenario, we show that it is possible to achieve a0.92 f-measure in identifying English-Hindi word pairs that aretransliterations. In order to assess the portability of our approach to othersimilar languages we adapt our system to the Gujarati language. |
Language |
Tools, systems, applications |
Topics |
Phonetic Databases, Phonology, Machine Translation, SpeechToSpeech Translation, Tools, systems, applications |
Full paper  |
English-Hindi Transliteration using Multiple Similarity Metrics |
Bibtex |
@InProceedings{ASWANI10.694,
author = {Niraj Aswani and Robert Gaizauskas}, title = {English-Hindi Transliteration using Multiple Similarity Metrics}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |