Title |
Bulgarian National Corpus Project |
Authors |
Svetla Koeva, Diana Blagoeva and Siya Kolkovska |
Abstract |
The paper presents Bulgarian National Corpus project (BulNC) - a large-scale,representative, online available corpus of Bulgarian. The BulNC is also amonolingual general corpus, fully morpho-syntactically (and partiallysemantically) annotated, and manually provided with detailed meta-datadescriptions. Presently the Bulgarian National corpus consists of about 320 000000 graphical wordsand includes more than 10 000 samples. Briefly the corpus structure and theaccepted criteria for representativeness and well-balancing are presented. Thequery language for advance search of collocations and concordances isdemonstrated with some examples - it allows to retrieve word combinations,ordered queries, inflexionally and semantically related words, part-of-speechtags, utilising Boolean operations and grouping as well. The BulNC alreadyplays a significant role in natural language processing of Bulgariancontributing to scientific advances in spelling and grammar checking, wordsense disambiguation, speech recognition, text categorisation, topic extractionand machine translation. The BulNC can also be used in different investigationsgoing beyond the linguistics: library studies, social sciences research,teaching methods studies, etc. |
Language |
Tools, systems, applications |
Topics |
Corpus (creation, annotation, etc.), LR national/international projects, organizational/policy issues, Tools, systems, applications |
Full paper  |
Bulgarian National Corpus Project |
Bibtex |
@InProceedings{KOEVA10.316,
author = {Svetla Koeva, Diana Blagoeva and Siya Kolkovska}, title = {Bulgarian National Corpus Project}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |