Summary of the paper

Title A Fact-aligned Corpus of Numerical Expressions
Authors Sandra Williams and Richard Power
Abstract We describe a corpus of numerical expressions, developed as part of the NUMGENproject. The corpus contains newspaper articles and scientific papers in whichexactly the same numerical facts are presented many times (both within andacross texts). Some annotations of numerical facts are original: for example,numbers are automatically classified as round or non-round by an algorithmderived fromJansen and Pollmann (2001); also, numerical hedges such as ‘about’ or ‘alittle under’ are marked up and classified semantically using arithmeticalrelations. Through explicit alignment of phrases describing the same fact, thecorpus can support research on the influence of various contextual factors(e.g., document position, intended readership) on the way in which numericalfacts are expressed. As an example we present results from an investigationshowing that when a fact is mentioned more than once in a text, there is aclear tendency for precision to increase from first to subsequent mentions, andfor mathematical level either to remain constant or to increase.
Language MultiWord Expressions & Collocations
Topics Corpus (creation, annotation, etc.), Natural Language Generation, MultiWord Expressions & Collocations
Full paper A Fact-aligned Corpus of Numerical Expressions
Bibtex @InProceedings{WILLIAMS10.185,
  author = {Sandra Williams and Richard Power},
  title = {A Fact-aligned Corpus of Numerical Expressions},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA