Summary of the paper

Title Design and Data Collection for the Accentological Corpus of the Russian Language
Authors Elena Grishina, Svetlana Savchuk and Alexej Poljakov
Abstract Accentological corpus provides a researcher an opportunity to study word stressand stress variation, which are very important for the Russian language.Moreover, Accentological corpus allows studying the history of the Russianlanguage stress development.The research presents the main characteristics of Accentological corpusavailable at ruscorpora.ru. Corpora size, type and sources of text material,the way it is represented in the corpora, types of linguistic annotation,corpora composition and ways of their effective use according to their purposesare described.There are two zones in the Accentological corpus. 1) The zone of prose includes oral texts and films transcripts, in whichstressed syllables are marked according to the real pronunciation. 2) The zoneof poetry contains texts with marked accented syllables, so it is possible todefine the exact word stress using special rules. The Accentological corpus has four types of annotations (metatextual,morphological, semantic and sociological) and also has its own accentologicalmark-up. Due to accentological annotation each word is supplied with stressmarks, so a user can make queries and retrieve the stressed or unstressed wordforms in combination with grammatical and semantic features.
Language Morphology
Topics Corpus (creation, annotation, etc.), Grammar and Syntax, Morphology
Full paper Design and Data Collection for the Accentological Corpus of the Russian Language
Bibtex @InProceedings{GRISHINA10.358,
  author = {Elena Grishina, Svetlana Savchuk and Alexej Poljakov},
  title = {Design and Data Collection for the Accentological Corpus of the Russian Language},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA