Title |
Identifying Sources of Weakness in Syntactic Lexicon Extraction |
Authors |
Claire Gardent and Alejandra Lorenzo |
Abstract |
Previous work has shown that large scale subcategorisation lexicons could beextracted from parsed corpora with reasonably high precision. In this paper, weapply a standard extraction procedure to a 100 millions words parsed corpus offrench and obtain rather poor results. We investigate different factors likelyto improve performance such as in particular, the specific extraction procedureand the parser used; the size of the input corpus; and the type of frameslearned. We try out different ways of interleaving the output of severalparsers with the lexicon extraction process and show that none of them improvesthe results. Conversely, we show that increasing the size of the input corpusand modifying the extraction procedure to better differentiate prepositionalarguments from prepositional modifiers improves performance. In conclusion, wesuggest that a more sophisticated approach to parser combination and betterprobabilistic models of the various types of prepositional objects in Frenchare likely ways to get better results. |
Language |
Validation of LRs |
Topics |
Acquisition, Lexicon, lexical database, Validation of LRs |
Full paper  |
Identifying Sources of Weakness in Syntactic Lexicon Extraction |
Bibtex |
@InProceedings{GARDENT10.177,
author = {Claire Gardent and Alejandra Lorenzo}, title = {Identifying Sources of Weakness in Syntactic Lexicon Extraction}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |