Title |
Hybrid Citation Extraction from Patents |
Authors |
Olivier Galibert, Sophie Rosset, Xavier Tannier and Fanny Grandry |
Abstract |
The Quaero project organized a set of evaluations of Named Entity recognitionsystems in 2009. One of the sub-tasks consists in extracting citations frompatents, i.e. references to other documents, either other patents or generalliterature from English-language patents. We present in this paper theparticipation of LIMSI in this evaluation, with a complete system descriptionand the evaluation results. The corpus shown that patent and non-patentcitations have a very different nature. We then separated references to otherpatents and to general literature papers and we created a hybrid system. Forpatent citations, the system used rule-based expert knowledge on the form ofregular expressions. The system for detecting non-patent citations, on theother hand, is purely stochastic (machine learning with CRF++). Then we mixedboth approaches to provide a single output. 4 teams participated to this taskand our system obtained the best results of this evaluation campaign, even ifthe difference between the first two systems is poorly significant. |
Language |
Tools, systems, applications |
Topics |
Named Entity recognition, Information Extraction, Information Retrieval, Tools, systems, applications |
Full paper  |
Hybrid Citation Extraction from Patents |
Bibtex |
@InProceedings{GALIBERT10.81,
author = {Olivier Galibert, Sophie Rosset, Xavier Tannier and Fanny Grandry}, title = {Hybrid Citation Extraction from Patents}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |