Summary of the paper

Title Grammar Extraction from Treebanks for Hindi and Telugu
Authors Prasanth Kolachina, Sudheer Kolachina, Anil Kumar Singh, Samar Husain, Viswanath Naidu, Rajeev Sangal and Aksar Bharati
Abstract Grammars play an important role in many Natural Language Processing (NLP)applications. The traditional approach to creating grammars manually, besidesbeing labor-intensive, has several limitations. With the availability of largescale syntactically annotated treebanks, it is now possible to automaticallyextract an approximate grammar of a language in any of the existing formalismsfrom a corresponding treebank. In this paper, we present a basic approach toextract grammars from dependency treebanks of two Indian languages, Hindi andTelugu. The process of grammar extraction requires a generalization mechanism.Towards this end, we explore an approach which relies on generalization ofargument structure over the verbs based on their syntactic similarity. Such ageneralization counters the effect of data sparseness in the treebanks. Agrammar extracted using this system can not only expand already existingknowledge bases for NLP tasks such as parsing, but also aid in the creation ofgrammars for languages where none exist. Further, we show that the grammarextraction process can help in identifying annotation errors and thus aid inthe task of the treebank validation.
Language Grammar and Syntax
Topics Lexicon, lexical database, Knowledge Discovery/Representation, Grammar and Syntax
Full paper Grammar Extraction from Treebanks for Hindi and Telugu
Bibtex @InProceedings{KOLACHINA10.854,
  author = {Prasanth Kolachina, Sudheer Kolachina, Anil Kumar Singh, Samar Husain, Viswanath Naidu, Rajeev Sangal and Aksar Bharati},
  title = {Grammar Extraction from Treebanks for Hindi and Telugu},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA