Title |
Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic |
Authors |
Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim and Ralf Steinberger |
Abstract |
We present a fully functional Arabic information extraction (IE) system that isused to analyze large volumes of news texts every day to extract the namedentity (NE) types person, organization, location, date and number, as well asquotations (direct reported speech) by and about people. The Named EntityRecognition (NER) system was not developed for Arabic, but - instead - a highlymultilingual, almost language-independent NER system was adapted to also coverArabic. The Semitic language Arabic substantially differs from theIndo-European and Finno-Ugric languages currently covered. This paper thusdescribes what Arabic language-specific resources had to be developed and whatchanges needed to be made to the otherwise language-independent rule set inorder to be applicable to the Arabic language. The achieved evaluation resultsare generally satisfactory, but could be improved for certain entity types. Theresults of the IE tools can be seen on the Arabic pages of the freelyaccessible Europe Media Monitor (EMM) application NewsExplorer, which can befound at http://press.jrc.it/overview.html. |
Language |
Multilinguality |
Topics |
Named Entity recognition, Information Extraction, Information Retrieval, Multilinguality |
Full paper  |
Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic |
Bibtex |
@InProceedings{ZAGHOUANI10.669,
author = {Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim and Ralf Steinberger}, title = {Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |