Summary of the paper

Title Annotation Time Stamps ― Temporal Metadata from the Linguistic Annotation Process
Authors Katrin Tomanek and Udo Hahn
Abstract We describe the re-annotation of selected types of named entities (persons, organizations, locations) from the Muc7 corpus. The focus of this annotation initiative is on recording the time needed for the linguistic process of named entity annotation. Annotation times are measured on two basic annotation units -- sentences vs. complex noun phrases. We gathered evidence that decision times are non-uniformly distributed over the annotation units, while they do not substantially deviate among annotators. This data seems to support the hypothesis that annotation times very much depend on the inherent "hardness" of each single annotation decision. We further show how such time-stamped information can be used for empirically grounded studies of selective sampling techniques, such as Active Learning. We directly compare Active Learning costs on the basis of token-based vs. time-based measurements. The data reveals that Active Learning keeps its competitive advantage over random sampling in both scenarios though the difference is less marked for the time metric than for the token metric.
Language Information Extraction, Information Retrieval
Topics Metadata, Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval
Full paper Annotation Time Stamps ― Temporal Metadata from the Linguistic Annotation Process
Bibtex @InProceedings{TOMANEK10.652,
  author = {Katrin Tomanek and Udo Hahn},
  title = {Annotation Time Stamps ― Temporal Metadata from the Linguistic Annotation Process},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA