Summary of the paper

Title Design and Development of Part-of-Speech-Tagging Resources for Wolof (Niger-Congo, spoken in Senegal)
Authors Cheikh M. Bamba Dione, Jonas Kuhn and Sina Zarrieß
Abstract In this paper, we report on the design of a part-of-speech-tagset for Wolof andon the creation of a semi-automatically annotated gold standard. In order toachieve high-quality annotation relatively fast, we first generated an accuratelexicon that draws on existing word and name lists and takes into accountinflectional and derivational morphology. The main motivation for the taggedcorpus is to obtain data for training automatic taggers with machine learningapproaches. Hence, we took machine learning considerations into account duringtagset design and we present training experiments as part of this paper. Thebest automatic tagger achieves an accuracy of 95.2% in cross-validationexperiments. We also wanted to create a basis for experimenting with annotationprojection techniques, which exploit parallel corpora. For this reason, it wasuseful to use a part of the Bible as the gold standard corpus, for whichsentence-aligned parallel versions in many languages are easy to obtain. Wealso report on preliminary experiments exploiting a statistical word alignmentof the parallel text.
Language Statistical and machine learning methods
Topics Part of speech tagging, Endangered languages, Statistical and machine learning methods
Full paper Design and Development of Part-of-Speech-Tagging Resources for Wolof (Niger-Congo, spoken in Senegal)
Bibtex @InProceedings{DIONE10.333,
  author = {Cheikh M. Bamba Dione, Jonas Kuhn and Sina Zarrieß},
  title = {Design and Development of Part-of-Speech-Tagging Resources for Wolof (Niger-Congo, spoken in Senegal)},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA