Summary of the paper

Title Morphological Annotation of Quranic Arabic
Authors Kais Dukes and Nizar Habash
Abstract The Quranic Arabic Corpus (http://corpus.quran.com) is an annotated linguisticresource with multiple layers of annotation including morphologicalsegmentation, part-of-speech tagging, and syntactic analysis using dependencygrammar. The motivation behind this work is to produce a resource that enablesfurther analysis of the Quran, the 1,400 year old central religious text ofIslam. This paper describes a new approach to morphological annotation ofQuranic Arabic, a genre difficult to compare with other forms of Arabic.Processing Quranic Arabic is a unique challenge from a computational point ofview, since the vocabulary and spelling differ from Modern Standard Arabic. TheQuranic Arabic Corpus differs from other Arabic computational resources inadopting a tagset that closely follows traditional Arabic grammar. We made thisdecision in order to leverage a large body of existing historical grammaticalanalysis, and to encourage online collaborative annotation. In this paper, wediscuss how the unique challenge of morphological annotation of Quranic Arabicis solved using a multi-stage approach. The different stages include automaticmorphological tagging using diacritic edit-distance, two-pass manualverification, and online collaborative annotation. This process is evaluated tovalidate the appropriateness of the chosen methodology.
Language Part of speech tagging
Topics Corpus (creation, annotation, etc.), Morphology, Part of speech tagging
Full paper Morphological Annotation of Quranic Arabic
Bibtex @InProceedings{DUKES10.276,
  author = {Kais Dukes and Nizar Habash},
  title = {Morphological Annotation of Quranic Arabic},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA