Summary of the paper

Title Mining the Web for the Induction of a Dialectical Arabic Lexicon
Authors Rania Al-Sabbagh and Roxana Girju
Abstract This paper describes the first phase of building a lexicon of Egyptian CaireneArabic (ECA) ― one of the most widely understood dialects in the Arab World― and Modern Standard Arabic (MSA). Each ECA entry is mapped to its MSAsynonym, Part-of-Speech (POS) tag and top-ranked contexts based on Web queries;and thus each entry is provided with basic syntactic and semantic informationfor a generic lexicon compatible with multiple NLP applications. Moreover,through their MSA synonyms, ECA entries acquire access to MSA available NLPtools and resources which are considerably available. Using an associationistapproach based on the correlations between word co-occurrence patterns in bothdialects, we change the direction of the acquisition process from parallel tocircular to overcome a bottleneck of current research on Arabic dialects,namely the lack of parallel corpora, and to alleviate accuracy rates for usingunrelated Web documents which are more frequently available. Manually evaluatedfor 1,000 word entries by two native speakers of the ECA-MSA varieties, theproposed approach achieves a promising F-measured performance rate of 70.9%. Indiscussion to the proposed algorithm, different semantic issues are highlightedfor upcoming phases of the induction of a more comprehensive ECA-MSA lexicon.
Language Semantics
Topics Lexicon, lexical database, Information Extraction, Information Retrieval, Semantics
Full paper Mining the Web for the Induction of a Dialectical Arabic Lexicon
Bibtex @InProceedings{ALSABBAGH10.344,
  author = {Rania Al-Sabbagh and Roxana Girju},
  title = {Mining the Web for the Induction of a Dialectical Arabic Lexicon},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA