Summary of the paper

Title NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems
Authors Eric Charton and Juan-Manuel Torres-Moreno
Abstract Availability of labeled language resources, such as annotated corpora anddomain dependent labeled language resources is crucial for experiments in thefield of Natural Language Processing. Most often, due to lack of resources,manual verification and annotation of electronic text material is aprerequisite for the development of NLP tools. In the context ofunder-resourced language, the lack ofcopora becomes a crucial problem because most of the research efforts aresupported by organizations with limited funds. Using free, multilingual andhighly structured corpora like Wikipedia to produce automatically labeledlanguage resources can be an answer to those needs. This paper introducesNLGbAse, a multilingual linguistic resource built from the Wikipediaencyclopedic content. This system produces structured metadata which makepossible the automatic annotation of corpora with syntactical and semanticallabels. Ametadata contains semantical and statistical informations related to anencyclopedic document. To validate our approach, we built and evaluated a NamedEntity Recognition tool, trained with Wikipedia corpora annotated by oursystem.
Language Named Entity recognition
Topics Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval, Named Entity recognition
Full paper NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems
Bibtex @InProceedings{CHARTON10.900,
  author = {Eric Charton and Juan-Manuel Torres-Moreno},
  title = {NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA