Summary of the paper

Title Anaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus
Authors Kepa Joseba Rodríguez, Francesca Delogu, Yannick Versley, Egon W. Stemle and Massimo Poesio
Abstract The Live Memories corpus is an Italian corpus annotated for anaphoricrelations. This annotation effort aims to contribute to two significant issuesfor the CL research: the lack of annotated anaphoric resources for Italian andthe increasing interest for the social Web. The Live Memories Corpus contains texts from the Italian Wikipedia about theregion Trentino/Süd Tirol and from blog sites with users' comments. It isplanned to add a set of articles of local news papers. The corpus includesmanual annotated information about morphosyntactic agreement, anaphoricity, andsemantic class of the NPs. The anaphoric annotation includes discourse deixis,bridging relations and markes cases of ambiguity with the annotation ofalternative interpretations. For the annotation of the anaphoric links thecorpus takes into account specific phenomena of the Italian language likeincorporated clitics and phonetically non realized pronouns. Reliability studies for the annotation of the mentioned phenomena and forannotation of anaphoric links in general offer satisfactory results. TheWikipedia and blogs dataset will be distributed under Creative CommonsAttributions licence.
Language Discourse annotation, representation and processing
Topics Anaphora, Coreference, Corpus (creation, annotation, etc.), Discourse annotation, representation and processing
Full paper Anaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus
Bibtex @InProceedings{RODRGUEZ10.431,
  author = {Kepa Joseba Rodríguez, Francesca Delogu, Yannick Versley, Egon W. Stemle and Massimo Poesio},
  title = {Anaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA