Summary of the paper

Title Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text
Authors Majdi Sawalha and Eric Atwell
Abstract Morphological analyzers and part-of-speech taggers are key technologies formost text analysis applications. Our aim is to develop a part-of-speech taggerfor annotating a wide range of Arabic text formats, domains and genresincluding both vowelized and non-vowelized text. Enriching the text withlinguistic analysis will maximize the potential for corpus re-use in a widerange of applications. We foresee the advantage of enriching the text withpart-of-speech tags of very fine-grained grammatical distinctions, whichreflect expert interest in syntax and morphology, but not specific needs ofend-users, because end-user applications are not known in advance. In thispaper we review existing Arabic Part-of-Speech Taggers and tag-sets, andillustrate four different Arabic PoS tag-sets for a sample of Arabic text fromthe Quran. We describe the detailed fine-grained morphological feature tag setof Arabic, and the fine-grained Arabic morphological analyzer algorithm. Wefaced practical challenges in applying the morphological analyzer to the100-million-word Web Arabic Corpus: we had to port the software to the NationalGrid Service, adapt the analyser to cope with spelling variations and errors,and utilise a Broad-Coverage Lexical Resource combining 23 traditional Arabiclexicons. Finally we outline the construction of a Gold Standard forcomparative evaluation.
Language Tools, systems, applications
Topics Part of speech tagging, Morphology, Tools, systems, applications
Full paper Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text
Bibtex @InProceedings{SAWALHA10.282,
  author = {Majdi Sawalha and Eric Atwell},
  title = {Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA