Summary of the paper

Title A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters
Authors Grzegorz Chrupała and Dietrich Klakow
Abstract Named Entity Recognition is a relatively well-understood NLP task,with many publicly available training resources and software forprocessing English data. Other languages tend to be underserved inthis area. For German, CoNLL-2003 Shared Task provided training data,but there are no publicly available, ready-to-use tools. We fill thisgap and develop a German NER system with state-of-the-artperformance. In addition to CoNLL 2003 labeled training data, we usetwo additional resources: (i) 32 million words of unlabeled newsarticle text and (ii) infobox labels from German Wikipedia articles.From the unlabeled text we derive distributional word clusters. Thenwe use cluster membership features and Wikipedia infobox labelfeatures to train a supervised model on the labeled trainingdata. This approach allows us to deal better with word-types unseen inthe training data and achieve good performance on Germanwith little engineering effort.
Language Tools, systems, applications
Topics Named Entity recognition, Multilinguality, Tools, systems, applications
Full paper A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters
Bibtex @InProceedings{CHRUPAA10.538,
  author = {Grzegorz Chrupała and Dietrich Klakow},
  title = {A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA