Summary of the paper

Title A Persian Part-Of-Speech Tagger Based on Morphological Analysis
Authors Mahdi Mohseni and Behrouz Minaei-bidgoli
Abstract This paper describes a method based on morphological analysis of words for aPersian Part-Of-Speech (POS) tagging system. This is a main part of a processfor expanding a large Persian corpus called Peyekare (or Textual Corpus ofPersian Language). Peykare is arranged into two parts: annotated andunannotated parts. We use the annotated part in order to create an automaticmorphological analyzer, a main segment of the system. Morphosyntactic featuresof Persian words cause two problems: the number of tags is increased in thecorpus (586 tags) and the form of the words is changed. This high number oftags debilitates any taggers to work efficiently. From other side the change ofword forms reduces the frequency of words with the same lemma; and the numberof words belonging to a specific tag reduces as well. This problem also has abad effect on statistical taggers. The morphological analyzer by removing theproblems helps the tagger to cover a large number of tags in the corpus. Usinga Markov tagger the method is evaluated on the corpus. The experiments show theefficiency of the method in Persian POS tagging.
Language Morphology
Topics Part of speech tagging, Corpus (creation, annotation, etc.), Morphology
Full paper A Persian Part-Of-Speech Tagger Based on Morphological Analysis
Bibtex @InProceedings{MOHSENI10.107,
  author = {Mahdi Mohseni and Behrouz Minaei-bidgoli},
  title = {A Persian Part-Of-Speech Tagger Based on Morphological Analysis},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA