Title |
Corpora for the Conceptualisation and Zoning of Scientific Papers |
Authors |
Maria Liakata, Simone Teufel, Advaith Siddharthan and Colin Batchelor |
Abstract |
We present two complementary annotation schemes for sentence based annotationof full scientific papers, CoreSC and AZ-II, applied to primary researcharticles in chemistry. AZ-II is the extension of AZ for chemistry papers. AZhas been shown to have been reliably annotated by independent human coders anduseful for various information access tasks. Like AZ, AZ-II follows therhetorical structure of a scientific paper and the knowledge claims made by theauthors. The CoreSC scheme takes a different view of scientific papers, treating them asthe humanly readable representations of scientific investigations. It seeks toretrieve the structure of the investigation from the paper as generichigh-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16chemistry experts over a total of 265 full papers in physical chemistry andbiochemistry.We describe the differences and similarities between the two schemes in detailand present the two corpora produced using each scheme. There are 36 sharedpapers in the corpora, which allows us to quantitatively compare aspects of theannotation schemes. We show the correlation between the two schemes, theirstrengths and weeknesses and discuss the benefits of combining a rhetoricalbased analysis of the papers with a content-based one. |
Language |
Document Classification, Text categorisation |
Topics |
Corpus (creation, annotation, etc.), Discourse annotation, representation and processing, Document Classification, Text categorisation |
Full paper  |
Corpora for the Conceptualisation and Zoning of Scientific Papers |
Bibtex |
author = {Maria Liakata, Simone Teufel, Advaith Siddharthan and Colin Batchelor}, title = {Corpora for the Conceptualisation and Zoning of Scientific Papers}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |