Title |
Morphological Annotation of Quranic Arabic |
Authors |
Kais Dukes and Nizar Habash |
Abstract |
The Quranic Arabic Corpus (http://corpus.quran.com) is an annotated linguisticresource with multiple layers of annotation including morphologicalsegmentation, part-of-speech tagging, and syntactic analysis using dependencygrammar. The motivation behind this work is to produce a resource that enablesfurther analysis of the Quran, the 1,400 year old central religious text ofIslam. This paper describes a new approach to morphological annotation ofQuranic Arabic, a genre difficult to compare with other forms of Arabic.Processing Quranic Arabic is a unique challenge from a computational point ofview, since the vocabulary and spelling differ from Modern Standard Arabic. TheQuranic Arabic Corpus differs from other Arabic computational resources inadopting a tagset that closely follows traditional Arabic grammar. We made thisdecision in order to leverage a large body of existing historical grammaticalanalysis, and to encourage online collaborative annotation. In this paper, wediscuss how the unique challenge of morphological annotation of Quranic Arabicis solved using a multi-stage approach. The different stages include automaticmorphological tagging using diacritic edit-distance, two-pass manualverification, and online collaborative annotation. This process is evaluated tovalidate the appropriateness of the chosen methodology. |
Language |
Part of speech tagging |
Topics |
Corpus (creation, annotation, etc.), Morphology, Part of speech tagging |
Full paper  |
Morphological Annotation of Quranic Arabic |
Bibtex |
@InProceedings{DUKES10.276,
author = {Kais Dukes and Nizar Habash}, title = {Morphological Annotation of Quranic Arabic}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |