Title |
The ConceptMapper Approach to Named Entity Recognition |
Authors |
Michael Tanenblatt, Anni Coden and Igor Sominsky |
Abstract |
ConceptMapper is an open source tool we created for classifying mentions in anunstructured text document based on concept terminologies (dictionaries) andyielding named entities as output. It is implemented as a UIMA (UnstructuredInformation Management Architecture) annotator and is highly configurable:concepts can come from standardised or proprietary terminologies; arbitraryattributes can be associated with dictionary entries, and those attributes canthen be associated with the named entities in the output; numerous searchstrategies and search options can be specified; any tokenizer packaged as aUIMA annotator can be used to tokenize the dictionary, so the same tokenizationcan be guaranteed for the input and dictionary, minimising tokenizationmismatch errors; and the types and features of UIMA annotations used as inputand generated as output can also be controlled. We describe ConceptMapper andits configuration parameters and their trade-offs, then describe the results ofan experiment wherein some of these parameters are varied and precision andrecall are subsequently measured in the task of in identifying concepts in acollection English-language clinical reports (colon cancer pathology).ConceptMapper is available from the Apache UIMA Sandbox, covered by the ApacheOpen Source license. |
Language |
Lexicon, lexical database |
Topics |
Named Entity recognition, Tools, systems, applications, Lexicon, lexical database |
Full paper  |
The ConceptMapper Approach to Named Entity Recognition |
Bibtex |
@InProceedings{TANENBLATT10.448,
author = {Michael Tanenblatt, Anni Coden and Igor Sominsky}, title = {The ConceptMapper Approach to Named Entity Recognition}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |