Summary of the paper

Title MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
Authors Tomaž Erjavec
Abstract The paper presents the fourth, ``Mondilex'' edition of the MULTEXT-Eastlanguage resources, a multilingual dataset for language engineering researchand development, focused on the morphosyntactic level of linguisticdescription. This standardised and linked set of resources covers a largenumber of mainly Central and Eastern European languages and includes theEAGLES-based morphosyntactic specifications; morphosyntactic lexica; andannotated parallel, comparable, and speech corpora. The fourth release of theseresources introduces XML-encoded morphosyntactic specifications and adds sixnew languages, bringing the total to 16: to Bulgarian, Croatian, Czech,Estonian, English, Hungarian, Romanian, Serbian, Slovene, and the Resiandialect of Slovene it adds Macedonian, Persian, Polish, Russian, Slovak, andUkrainian. This dataset, unique in terms of languages covered and the wealth ofencoding, is extensively documented, and freely available for research purposesat http://nl.ijs.si/ME/V4/.
Language Standards for LRs
Topics Part of speech tagging, Morphology, Standards for LRs
Full paper MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
Bibtex @InProceedings{ERJAVEC10.138,
  author = {Tomaž Erjavec},
  title = {MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA