Summary of the paper

Title The RODRIGO Database
Authors Nicolas Serrano, Francisco Castro and Alfons Juan
Abstract Annotation of digitized pages from historical document collections is veryimportant to research on automatic extraction oftext blocks, lines, and handwriting recognition. We have recently introduced anew handwritten text database, GERMANA,which is based on a Spanish manuscript from 1891. To our knowledge, GERMANA isthe first publicly available databasemostly written in Spanish and comparable in size to standard databases. In thispaper, we present another handwrittentext database, RODRIGO, completely written in Spanish and comparable in size toGERMANA. However, RODRIGOcomes from a much older manuscript, from 1545, where the typical difficultcharacteristics of historical documents aremore evident. In particular, the writing style, which has clear Gothicinfluences, is significantly more complex than thatof GERMANA. We also provide baseline results of handwriting recognition forreference in future studies, using standardtechniques and tools for preprocessing, feature extraction, HMM-based imagemodelling, and language modelling.
Language Handwriting recognition
Topics Corpus (creation, annotation, etc.), Digital libraries, Handwriting recognition
Full paper The RODRIGO Database
Bibtex @InProceedings{SERRANO10.477,
  author = {Nicolas Serrano, Francisco Castro and Alfons Juan},
  title = {The RODRIGO Database},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA