Summary of the paper

Title Bulgarian National Corpus Project
Authors Svetla Koeva, Diana Blagoeva and Siya Kolkovska
Abstract The paper presents Bulgarian National Corpus project (BulNC) - a large-scale,representative, online available corpus of Bulgarian. The BulNC is also amonolingual general corpus, fully morpho-syntactically (and partiallysemantically) annotated, and manually provided with detailed meta-datadescriptions. Presently the Bulgarian National corpus consists of about 320 000000 graphical wordsand includes more than 10 000 samples. Briefly the corpus structure and theaccepted criteria for representativeness and well-balancing are presented. Thequery language for advance search of collocations and concordances isdemonstrated with some examples - it allows to retrieve word combinations,ordered queries, inflexionally and semantically related words, part-of-speechtags, utilising Boolean operations and grouping as well. The BulNC alreadyplays a significant role in natural language processing of Bulgariancontributing to scientific advances in spelling and grammar checking, wordsense disambiguation, speech recognition, text categorisation, topic extractionand machine translation. The BulNC can also be used in different investigationsgoing beyond the linguistics: library studies, social sciences research,teaching methods studies, etc.
Language Tools, systems, applications
Topics Corpus (creation, annotation, etc.), LR national/international projects, organizational/policy issues, Tools, systems, applications
Full paper Bulgarian National Corpus Project
Bibtex @InProceedings{KOEVA10.316,
  author = {Svetla Koeva, Diana Blagoeva and Siya Kolkovska},
  title = {Bulgarian National Corpus Project},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA