Summary of the paper

Title Recent Developments in the National Corpus of Polish
Authors Adam Przepiórkowski, Rafał L. Górski, Marek Łaziński and Piotr Pęzik
Abstract The aim of the paper is to present recent ― as of March 2010 ― developmentsin the construction of the National Corpus of Polish (NKJP). The NKJP projectwas launched at the very end of 2007 and it is aimed at compiling a large,linguistically annotated corpus of contemporary Polish by the end of 2010. Outof the total pool of 1 billion words of text data collected in the project, a300 million word balanced corpus will be selected to match a set of predefinedrepresentativeness criteria. This present paper outlines a number of recentdevelopments in the NKJP project, including: 1) the design of text encoding XMLschemata for various levels of linguistic information, 2) a new tool for manualannotation at various levels, 3) numerous improvements in search tools. As thework on NKJP progresses, it becomes clear that this project serves as animportant testbed for linguistic annotation and interoperability standards. Webelieve that our recent experiences will prove relevant to future large-scalelanguage resource compilation efforts.
Language Standards for LRs
Topics Corpus (creation, annotation, etc.), LR national/international projects, organizational/policy issues, Standards for LRs
Full paper Recent Developments in the National Corpus of Polish
Bibtex @InProceedings{PRZEPIRKOWSKI10.152,
  author = {Adam Przepiórkowski, Rafał L. Górski, Marek Łaziński and Piotr Pęzik},
  title = {Recent Developments in the National Corpus of Polish},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA