Title |
NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems |
Authors |
Eric Charton and Juan-Manuel Torres-Moreno |
Abstract |
Availability of labeled language resources, such as annotated corpora anddomain dependent labeled language resources is crucial for experiments in thefield of Natural Language Processing. Most often, due to lack of resources,manual verification and annotation of electronic text material is aprerequisite for the development of NLP tools. In the context ofunder-resourced language, the lack ofcopora becomes a crucial problem because most of the research efforts aresupported by organizations with limited funds. Using free, multilingual andhighly structured corpora like Wikipedia to produce automatically labeledlanguage resources can be an answer to those needs. This paper introducesNLGbAse, a multilingual linguistic resource built from the Wikipediaencyclopedic content. This system produces structured metadata which makepossible the automatic annotation of corpora with syntactical and semanticallabels. Ametadata contains semantical and statistical informations related to anencyclopedic document. To validate our approach, we built and evaluated a NamedEntity Recognition tool, trained with Wikipedia corpora annotated by oursystem. |
Language |
Named Entity recognition |
Topics |
Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval, Named Entity recognition |
Full paper  |
NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems |
Bibtex |
@InProceedings{CHARTON10.900,
author = {Eric Charton and Juan-Manuel Torres-Moreno}, title = {NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |