Summary of the paper

Title Mining Naturally-occurring Corrections and Paraphrases from Wikipedia’s Revision History
Authors Aurélien Max and Guillaume Wisniewski
Abstract Naturally-occurring instances of linguistic phenomena are important both fortraining and for evaluating automatic text processing. When available in largequantities, they also prove interesting material for linguistic studies. Inthis article, we present WiCoPaCo (Wikipedia Correction and Paraphrase Corpus),a new freely-available resource built by automatically mining Wikipedia’srevision history. The WiCoPaCo corpus focuses on local modifications made byhuman revisors and include various types of corrections (such as spelling erroror typographical corrections) and rewritings, which can be categorized broadlyinto meaning-preserving and meaning-altering revisions. We present an initialhand-built typology of these revisions, but the resource allows for anypossible annotation scheme. We discuss the main motivations for building such aresource and describe the main technical details guiding its construction. Wealso present applications and data analysis on French and report initialresults on spelling error correction and morphosyntactic rewriting. TheWiCoPaCo corpus can be freely downloaded from http://wicopaco.limsi.fr.
Language Authoring tools, proofing
Topics Corpus (creation, annotation, etc.), Textual Entailment and Paraphrasing, Authoring tools, proofing
Full paper Mining Naturally-occurring Corrections and Paraphrases from Wikipedia’s Revision History
Bibtex @InProceedings{MAX10.827,
  author = {Aurélien Max and Guillaume Wisniewski},
  title = {Mining Naturally-occurring Corrections and Paraphrases from Wikipedia’s Revision History},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA