Title |
Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain |
Authors |
Nelleke Oostdijk, Suzan Verberne and Cornelis Koster |
Abstract |
For mining intellectual property texts (patents), a broad-coverage lexicon thatcovers general English words together with terminology from the patent domainis indispensable. The patent domain is very diffuse as it comprises a varietyof technical domains (e.g. Human Necessities, Chemistry & Metallurgy andPhysics in the International Patent Classification). As a result, collecting alexicon that covers the language used in patent texts is not a straightforwardtask. In this paper we describe the approach that we have developed for thesemi-automatic construction of a broad-coverage lexicon for classification andinformation retrieval in the patent domain and which combines information frommultiple sources. Our contribution is twofold. First, we provide insight intothe difficulties of developing lexical resources for information retrieval andtext mining in the patent domain, a research and development field that isexpanding quickly. Second, we create a broad coverage lexicon annotated withrich lexical information and containing both general English word forms anddomain terminology for various technical domains. |
Language |
Morphology |
Topics |
Lexicon, lexical database, MultiWord Expressions & Collocations, Morphology |
Full paper  |
Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain |
Bibtex |
@InProceedings{OOSTDIJK10.378,
author = {Nelleke Oostdijk, Suzan Verberne and Cornelis Koster}, title = {Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |