Title |
The Creation of a Large-Scale LFG-Based Gold Parsebank |
Authors |
Alexis Baird and Christopher R. Walker |
Abstract |
Systems for syntactically parsing sentences have long been recognized as apriority in Natural Language Processing. Statistics-based systems requirelarge amounts of high quality syntactically parsed data. Using the XLE toolkitdeveloped at PARC and the LFG Parsebanker interface developed at Bergen, theParsebank Project at Powerset has generated a rapidly increasing volume ofsyntactically parsed data. By using these tools, we are able to leverage theLFG framework to provide richer analyses via both constituent (c-) andfunctional (f-) structures. Additionally, the Parsebanking Project usessource data from Wikipedia rather than source data limited to a specific genre,such as the Wall Street Journal. This paper outlines the process we used increating a large-scale LFG-Based Parsebank to address many of the shortcomingsof previously-created parse banks such as the Penn Treebank. While theParsebank corpus is still in progress, preliminary results using the data in avariety of contexts already show promise. |
Language |
Grammar and Syntax |
Topics |
Parsing, Corpus (creation, annotation, etc.), Grammar and Syntax |
Full paper  |
The Creation of a Large-Scale LFG-Based Gold Parsebank |
Bibtex |
@InProceedings{BAIRD10.445,
author = {Alexis Baird and Christopher R. Walker}, title = {The Creation of a Large-Scale LFG-Based Gold Parsebank}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |