Summary of the paper

Title English-Spanish Large Statistical Dictionary of Inflectional Forms
Authors Grigori Sidorov, Alberto Barrón-Cedeño and Paolo Rosso
Abstract The paper presents an approach for constructing a weighted bilingual dictionaryof inflectional forms using as input data a traditional bilingual dictionary,and not parallel corpora. An algorithm is developed that generates all possiblemorphological (inflectional) forms and weights them using information ondistribution of corresponding grammar sets (grammar information) in largecorpora for each language. The algorithm also takes into account thecompatibility of grammar sets in a language pair; for example, verb in pasttense in language L normally is expected to be translated by verb in past tensein Language L'. We consider that the developed method is universal, i.e. can beapplied to any pair of languages. The obtained dictionary is freely available.It can be used in several NLP tasks, for example, statistical machinetranslation.
Language Machine Translation, SpeechToSpeech Translation
Topics Lexicon, lexical database, Morphology, Machine Translation, SpeechToSpeech Translation
Full paper English-Spanish Large Statistical Dictionary of Inflectional Forms
Bibtex @InProceedings{SIDOROV10.229,
  author = {Grigori Sidorov, Alberto Barrón-Cedeño and Paolo Rosso},
  title = {English-Spanish Large Statistical Dictionary of Inflectional Forms},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA