Summary of the paper

Title From XML to XML: The Why and How of Making the Biodiversity Literature Accessible to Researchers
Authors Alistair Willis, David King, David Morse, Anton Dil, Chris Lyal and Dave Roberts
Abstract We present the ABLE document collection, which consists of a set of annotatedvolumes of the Bulletin of the British Museum (Natural History). These weredeveloped during our ongoing work on automating the markup of scanned copies ofthe biodiversity literature. Such automation is required if historic literatureis to be used to inform contemporary issues in biodiversity research. Weconsider an enhanced TEI XML markup language, which is used as an intermediatestage in translating from the initial XML obtained from Optical CharacterRecognition to taXMLit, the target annotation schema. The intermediaterepresentation allows additional information from external sources such as ataxonomic thesaurus to be incorporated before the final translation intotaXMLit. We give an overview of the project workflow in automating the markupprocess, and consider what extensions to existing markup schema will berequired to best support working taxonomists. Finally, we discuss some of theparticular issues which were encountered in converting between different XMLformats.
Language Corpus (creation, annotation, etc.)
Topics Digital libraries, Metadata, Corpus (creation, annotation, etc.)
Full paper From XML to XML: The Why and How of Making the Biodiversity Literature Accessible to Researchers
Bibtex @InProceedings{WILLIS10.787,
  author = {Alistair Willis, David King, David Morse, Anton Dil, Chris Lyal and Dave Roberts},
  title = {From XML to XML: The Why and How of Making the Biodiversity Literature Accessible to Researchers},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA