Summary of the paper

Title Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain
Authors Nelleke Oostdijk, Suzan Verberne and Cornelis Koster
Abstract For mining intellectual property texts (patents), a broad-coverage lexicon thatcovers general English words together with terminology from the patent domainis indispensable. The patent domain is very diffuse as it comprises a varietyof technical domains (e.g. Human Necessities, Chemistry & Metallurgy andPhysics in the International Patent Classification). As a result, collecting alexicon that covers the language used in patent texts is not a straightforwardtask. In this paper we describe the approach that we have developed for thesemi-automatic construction of a broad-coverage lexicon for classification andinformation retrieval in the patent domain and which combines information frommultiple sources. Our contribution is twofold. First, we provide insight intothe difficulties of developing lexical resources for information retrieval andtext mining in the patent domain, a research and development field that isexpanding quickly. Second, we create a broad coverage lexicon annotated withrich lexical information and containing both general English word forms anddomain terminology for various technical domains.
Language Morphology
Topics Lexicon, lexical database, MultiWord Expressions & Collocations, Morphology
Full paper Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain
Bibtex @InProceedings{OOSTDIJK10.378,
  author = {Nelleke Oostdijk, Suzan Verberne and Cornelis Koster},
  title = {Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA