Summary of the paper

Title A High Recall Error Identification Tool for Hindi Treebank Validation
Authors Bharat Ram Ambati, Mridul Gupta, Samar Husain and Dipti Misra Sharma
Abstract This paper describes the development of a hybrid tool for a semi-automatedprocess for validation of treebank annotation at various levels. The tool isdeveloped for error detection at the part-of-speech, chunk and dependencylevels of a Hindi treebank, currently under development. The tool aims toidentify as many errors as possible at these levels to achieve consistency inthe task of annotation. Consistency in treebank annotation is a must for makingdata as error-free as possible and for providing quality assurance. The tool isaimed at ensuring consistency and to make manual validation cost effective. Wediscuss a rule based and a hybrid approach (statistical methods combined withrule-based methods) by which a high-recall system can be developed and used toidentify errors in the treebank. We report some results of using the tool on asample of data extracted from the Hindi treebank. We also argue how the toolcan prove useful in improving the annotation guidelines which would in turn,better the quality of annotation in subsequent iterations.
Language Standards for LRs
Topics Validation of LRs, Corpus (creation, annotation, etc.), Standards for LRs
Full paper A High Recall Error Identification Tool for Hindi Treebank Validation
Bibtex @InProceedings{AMBATI10.673,
  author = {Bharat Ram Ambati, Mridul Gupta, Samar Husain and Dipti Misra Sharma},
  title = {A High Recall Error Identification Tool for Hindi Treebank Validation},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA