Summary of the paper

Title A Large List of Confusion Sets for Spellchecking Assessed Against a Corpus of Real-word Errors
Authors Jennifer Pedler and Roger Mitton
Abstract One of the methods that has been proposed for dealing with real-word errors(errors that occur when a correctly spelled word is substituted for the oneintended) is the "confusion-set" approach - a confusion set being a small groupof words that are likely to be confused with one another. Using a list ofconfusion sets drawn up in advance, a spellchecker, on finding one of thesewords in a text, can assess whether one of the other members of its set wouldbe a better fit and, if it appears to be so, propose that word as a correction.Much of the research using this approach has suffered from two weaknesses. Thefirst is the small number of confusion sets used. The second is that systemshave largely been tested on artificial errors. In this paper we address thesetwo weaknesses. We describe the creation of a realistically sized list ofconfusion sets, then the assembling of a corpus of real-word errors, and thenwe assess the potential of that list in relation to that corpus.
Language Tools, systems, applications
Topics Corpus (creation, annotation, etc.), Grammar and Syntax, Tools, systems, applications
Full paper A Large List of Confusion Sets for Spellchecking Assessed Against a Corpus of Real-word Errors
Bibtex @InProceedings{PEDLER10.122,
  author = {Jennifer Pedler and Roger Mitton},
  title = {A Large List of Confusion Sets for Spellchecking Assessed Against a Corpus of Real-word Errors},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA