LREC 2010 Proceedings

Summary of the paper

Title	Hybrid Citation Extraction from Patents
Authors	Olivier Galibert, Sophie Rosset, Xavier Tannier and Fanny Grandry
Abstract	The Quaero project organized a set of evaluations of Named Entity recognitionsystems in 2009. One of the sub-tasks consists in extracting citations frompatents, i.e. references to other documents, either other patents or generalliterature from English-language patents. We present in this paper theparticipation of LIMSI in this evaluation, with a complete system descriptionand the evaluation results. The corpus shown that patent and non-patentcitations have a very different nature. We then separated references to otherpatents and to general literature papers and we created a hybrid system. Forpatent citations, the system used rule-based expert knowledge on the form ofregular expressions. The system for detecting non-patent citations, on theother hand, is purely stochastic (machine learning with CRF++). Then we mixedboth approaches to provide a single output. 4 teams participated to this taskand our system obtained the best results of this evaluation campaign, even ifthe difference between the first two systems is poorly significant.
Language	Tools, systems, applications
Topics	Named Entity recognition, Information Extraction, Information Retrieval, Tools, systems, applications
Full paper	Hybrid Citation Extraction from Patents
Bibtex	@InProceedings{GALIBERT10.81, author = {Olivier Galibert, Sophie Rosset, Xavier Tannier and Fanny Grandry}, title = {Hybrid Citation Extraction from Patents}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }