Summary of the paper

Title Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler
Authors Michael Gasser
Abstract Resource-poor languages may suffer from a lack of any of the basic resourcesthat are fundamental to computational linguistics, including an adequatedigital lexicon. Given the relatively small corpus of texts that exists forsuch languages, extending the lexicon presents a challenge. Languages withcomplex morphology present a special case, however, because individual words inthese languages provide a great deal of information about the grammaticalproperties of the roots that they are based on. Given a morphological analyzer,it is even possible to extract novel roots from words. In this paper, we lookat the case of Tigrinya, a Semitic language with limited lexical resources forwhich a morphological analyzer is available. It is shown that this analyzerapplied to the list of more than 200,000 Tigrinya words that is extracted by aweb crawler can extend the lexicon in two ways, by adding new roots and byinferring some of the derivational constraints that apply to known roots.
Language Grammar and Syntax
Topics Lexicon, lexical database, Morphology, Grammar and Syntax
Full paper Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler
Bibtex @InProceedings{GASSER10.926,
  author = {Michael Gasser},
  title = {Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA