Summary of the paper

Title The Design of Syntactic Annotation Levels in the National Corpus of Polish
Authors Katarzyna Głowińska and Adam Przepiórkowski
Abstract The paper presents the procedure of syntactic annotation of theNational Corpus of Polish.The paper concentrates on the delimitation of syntactic words (analyticalforms, reflexive verbs, discontinuous conjunctions, etc.) andsyntactic groups, as well as on problems encountered during theannotation process: syntactic group boundaries, multiword entities,abbreviations, discontinuous phrases and syntactic words. It includesthe complete tagset for syntactic words and the list of syntacticgroups recognized in NKJP. The tagset defines grammatical classes andcategories according to morphosyntactic and syntactic criteria only.Syntactic annotation in the National Corpus of Polish is limited tomaking constituents of combinations of words. Annotation depends onshallow parsing and manual post-editing of the results by annotators.Manual annotation is performed by two independents annotators, with areferee in cases of disagreement. The manually constructed grammar,both for syntactic words and for syntactic groups, is encoded in theshallow parsing system Spejd.
Language Part of speech tagging
Topics Corpus (creation, annotation, etc.), Grammar and Syntax, Part of speech tagging
Full paper The Design of Syntactic Annotation Levels in the National Corpus of Polish
Bibtex @InProceedings{GOWISKA10.259,
  author = {Katarzyna Głowińska and Adam Przepiórkowski},
  title = {The Design of Syntactic Annotation Levels in the National Corpus of Polish},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA