Summary of the paper

Title Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages
Authors Niraj Aswani and Robert Gaizauskas
Abstract A considerable amount of work has been put into development of stemmers andmorphological analysers. The majority of these approaches use hand-craftedsuffix-replacement rules but a few try to discover such rules from corpora.While most of the approaches remove or replace suffixes, there are examples ofderivational stemmers which are based on prefixes as well. In this paper wepresent a rule-based morphological analyser. We propose an approach that takesboth prefixes as well as suffixes into account. Given a corpus and adictionary, our method can be used to obtain a set of suffix-replacement rulesfor deriving an inflected word’s root form. We developed an approach for theHindi language but show that the approach is portable, at least to relatedlanguages, by adapting it to the Gujarati language. Given that the entireprocess of developing such a ruleset is simple and fast, our approach can beused for rapid development of morphological analysers and yet it can obtaincompetitive results with analysers built relying on human authored rules.
Language Tools, systems, applications
Topics Morphology, Multilinguality, Tools, systems, applications
Full paper Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages
Bibtex @InProceedings{ASWANI10.616,
  author = {Niraj Aswani and Robert Gaizauskas},
  title = {Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA