Summary of the paper

Title Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic
Authors Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim and Ralf Steinberger
Abstract We present a fully functional Arabic information extraction (IE) system that isused to analyze large volumes of news texts every day to extract the namedentity (NE) types person, organization, location, date and number, as well asquotations (direct reported speech) by and about people. The Named EntityRecognition (NER) system was not developed for Arabic, but - instead - a highlymultilingual, almost language-independent NER system was adapted to also coverArabic. The Semitic language Arabic substantially differs from theIndo-European and Finno-Ugric languages currently covered. This paper thusdescribes what Arabic language-specific resources had to be developed and whatchanges needed to be made to the otherwise language-independent rule set inorder to be applicable to the Arabic language. The achieved evaluation resultsare generally satisfactory, but could be improved for certain entity types. Theresults of the IE tools can be seen on the Arabic pages of the freelyaccessible Europe Media Monitor (EMM) application NewsExplorer, which can befound at http://press.jrc.it/overview.html.
Language Multilinguality
Topics Named Entity recognition, Information Extraction, Information Retrieval, Multilinguality
Full paper Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic
Bibtex @InProceedings{ZAGHOUANI10.669,
  author = {Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim and Ralf Steinberger},
  title = {Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA