Summary of the paper

Title The Development of a Morphosyntactic Tagset for Afrikaans and its Use with Statistical Tagging
Authors Boris Haselbach and Ulrich Heid
Abstract In this paper, we present a morphosyntactic tagset for Afrikaans based on theguidelines developed by the Expert Advisory Group on Language EngineeringStandards (EAGLES). We compare our slim yet expressive tagset, MAATS(Morphosyntactic AfrikAans TagSet), with an existing one which primarilyfocuses on a detailed morphosyntactic and semantic description of word forms. MAATS will primarily be used for the extraction of lexical data from largepos-tagged corpora. We not only focus on morphosyntactic properties but also onthe processability with statistical tagging. We discuss the tagset design andmotivate our classification of Afrikaans word forms, in particular we focus onthe categorization of verbs and conjunctions. The complete tagset in presentedand we briefly discuss each word class. In a case study with an Afrikaansnewspaper corpus, we evaluate our tagset with four different statisticaltaggers. Despite a relatively small amount of training data, however with alarge tagger lexicon, TnT-Tagger scores 97.05 % accuracy. Additionally, wepresent some error sources and discuss future work.
Language Validation of LRs
Topics Part of speech tagging, Corpus (creation, annotation, etc.), Validation of LRs
Full paper The Development of a Morphosyntactic Tagset for Afrikaans and its Use with Statistical Tagging
Bibtex @InProceedings{HASELBACH10.318,
  author = {Boris Haselbach and Ulrich Heid},
  title = {The Development of a Morphosyntactic Tagset for Afrikaans and its Use with Statistical Tagging},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA