Summary of the paper

Title Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank
Authors Seth Kulick, Ann Bies and Mohamed Maamouri
Abstract Complications arise for standoff annotation when the annotation is not on thesource text itself, but on a more abstract representation. This isparticularly the case in a language such as Arabic with morphological andorthographic challenges, and we discuss various aspects of these issues in thecontext of the Arabic Treebank. The Standard Arabic Morphological Analyzer(SAMA) is closely integrated into the annotation workflow, as the basis for theabstraction between the explicit source text and the more abstract tokenrepresentation. However, this integration with SAMA gives rise to variousproblems for the annotation workflow and for maintaining the link between theTreebank and SAMA. In this paper we discuss how we have overcome theseproblems with consistent and more precise categorization of all of the tokensfor their relationship with SAMA. We also discuss how we have improved thecreation of several distinct alternative forms of the tokens used in thesyntactic trees. As a result, the Treebank provides a resource relating thedifferent forms of the same underlying token with varying degrees ofvocalization, in terms of how they relate (1) to each other, (2) to thesyntactic structure, and (3) to the morphological analyzer.
Language Grammar and Syntax
Topics Corpus (creation, annotation, etc.), Morphology, Grammar and Syntax
Full paper Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank
Bibtex @InProceedings{KULICK10.566,
  author = {Seth Kulick, Ann Bies and Mohamed Maamouri},
  title = {Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA