Summary of the paper

Title A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression
Authors Claude de Loupy, Marie Guégan, Christelle Ayache, Somara Seng and Juan-Manuel Torres Moreno
Abstract This paper presents two corpora produced within the RPM2 project: amulti-document summarization corpus and a sentence compression corpus. Bothcorpora are in French. The first one is the only one we know in this language.It contains 20 topics with 20 documents each. A first set of 10 documents pertopic is summarized and then the second set is used to produce an updatesummarization (new information). 4 annotators were involved and produced atotal of 160 abstracts. The second corpus contains all the sentences of thefirst one. 4 annotators were asked to compress the 8432 sentences. This is thebiggest corpus of compressed sentences we know, whatever the language. Thepaper provides some figures in order to compare the different annotators:compression rates, number of tokens per sentence, percentage of tokens keptaccording to their POS, position of dropped tokens in the sentence compressionphase, etc. These figures show important differences from an annotator to theother. Another point is the different strategies of compression used accordingto the length of the sentence.
Language
Topics Corpus (creation, annotation, etc.), Summarisation
Full paper A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression
Bibtex @InProceedings{DELOUPY10.919,
  author = {Claude de Loupy, Marie Guégan, Christelle Ayache, Somara Seng and Juan-Manuel Torres Moreno},
  title = {A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA