Summary of the paper

Title The ConceptMapper Approach to Named Entity Recognition
Authors Michael Tanenblatt, Anni Coden and Igor Sominsky
Abstract ConceptMapper is an open source tool we created for classifying mentions in anunstructured text document based on concept terminologies (dictionaries) andyielding named entities as output. It is implemented as a UIMA (UnstructuredInformation Management Architecture) annotator and is highly configurable:concepts can come from standardised or proprietary terminologies; arbitraryattributes can be associated with dictionary entries, and those attributes canthen be associated with the named entities in the output; numerous searchstrategies and search options can be specified; any tokenizer packaged as aUIMA annotator can be used to tokenize the dictionary, so the same tokenizationcan be guaranteed for the input and dictionary, minimising tokenizationmismatch errors; and the types and features of UIMA annotations used as inputand generated as output can also be controlled. We describe ConceptMapper andits configuration parameters and their trade-offs, then describe the results ofan experiment wherein some of these parameters are varied and precision andrecall are subsequently measured in the task of in identifying concepts in acollection English-language clinical reports (colon cancer pathology).ConceptMapper is available from the Apache UIMA Sandbox, covered by the ApacheOpen Source license.
Language Lexicon, lexical database
Topics Named Entity recognition, Tools, systems, applications, Lexicon, lexical database
Full paper The ConceptMapper Approach to Named Entity Recognition
Bibtex @InProceedings{TANENBLATT10.448,
  author = {Michael Tanenblatt, Anni Coden and Igor Sominsky},
  title = {The ConceptMapper Approach to Named Entity Recognition},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA