Summary of the paper

Title Towards the Annotation of Named Entities in the National Corpus of Polish
Authors Agata Savary, Jakub Waszczuk and Adam Przepiórkowski
Abstract We present the named entity annotation task within the on-going project of theNational Corpus of Polish. To the best of our knowledge, this is the firstattempt at a large-scale corpus annotation of Polish named entities. Wedescribe the scope and the TEI-inspired hierarchy of named entities admittedfor this task, as well as the TEI-conformant multi-level stand-off annotationformat. We also discuss some methodological strategies including the annotationof embedded, coordinated and discontinuous names. Our annotation platformconsists of two main tools interconnected by converting facilities. Arule-based natural language processing platform SProUT is used for theautomatic pre-annotation of named entities, due to the previously createdPolish extraction grammars adapted to the annotation task. A customizablegraphical tree editor TrEd, extended to our needs, provides an ergonomicenvironment for manual correction of annotations. Despite some difficult casesencountered in the early annotation phase, about 2,600 named entities in 1,800corpus sentences have presently been annotated, which allowed to validate theproject methodology and tools.
Language Standards for LRs
Topics Corpus (creation, annotation, etc.), Named Entity recognition, Standards for LRs
Full paper Towards the Annotation of Named Entities in the National Corpus of Polish
Bibtex @InProceedings{SAVARY10.879,
  author = {Agata Savary, Jakub Waszczuk and Adam Przepiórkowski},
  title = {Towards the Annotation of Named Entities in the National Corpus of Polish},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA