Summary of the paper

Title Towards an Improved Methodology for Automated Readability Prediction
Authors Philip van Oosten, Dries Tanghe and Véronique Hoste
Abstract Since the first half of the 20th century, readability formulas have been widelyemployed to automatically predict the readability of an unseen text. In thisarticle, the formulas and the text characteristics they are composed of areevaluated in the context of large Dutch and English corpora. We describe thebehaviour of the formulas and the text characteristics by means of correlationmatrices and a principal component analysis, and test the methodologicalvalidity of the formulas by means of collinearity tests. Both the correlationmatrices and the principal component analysis show that the formulas describedin this paper strongly correspond, regardless of the language for which theywere designed. Furthermore, the collinearity test reveals shortcomings in themethodology that was used to create some of the existing readability formulas.All of this leads us to conclude that a new readability prediction method isneeded. We finally make suggestions to come to a cleaner methodology andpresent web applications that will help us collect data to compile a new goldstandard for readability prediction.
Language Other
Topics Evaluation methodologies, Tools, systems, applications, Other
Full paper Towards an Improved Methodology for Automated Readability Prediction
Bibtex @InProceedings{VANOOSTEN10.286,
  author = {Philip van Oosten, Dries Tanghe and Véronique Hoste},
  title = {Towards an Improved Methodology for Automated Readability Prediction},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA