Summary of the paper

Title Building a Gold Standard for Event Detection in Croatian
Authors Nikola Ljubešić, Tomislava Lauc and Damir Boras
Abstract This paper describes the process of building a newspaper corpus annotated withevents described in specific documents. The main difference to the corporabuilt as part of the TDT initiative is that documents are not annotated bytopics, but by specific events they describe. Additionally, documents aregathered from sixteen sources and all documents in the corpus are annotatedwith the corresponding event. The annotation process consists of a browsing anda searching step. Experiments are performed with a threshold that could be usedin the browsing step yielding the result of having to browse through only 1% ofdocument pairs for a 2% loss of relevant document pairs. A statistical analysisof the annotated corpus is undertaken showing that most events are described byfew documents while just some events are reported by many documents. Theinter-annotator agreement measures show high agreement concerning groupingdocuments into event clusters, but show a much lower agreement concerning thenumber of events the documents are organized into. An initial experiment isdescribed giving a baseline for further research on this corpus.
Language Document Classification, Text categorisation
Topics Topic detection & tracking, Evaluation methodologies, Document Classification, Text categorisation
Full paper Building a Gold Standard for Event Detection in Croatian
Bibtex @InProceedings{LJUBEI10.213,
  author = {Nikola Ljubešić, Tomislava Lauc and Damir Boras},
  title = {Building a Gold Standard for Event Detection in Croatian},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA