Title |
Evaluating Distributional Properties of Tagsets |
Authors |
Markus Dickinson and Charles Jochim |
Abstract |
We investigate which distributional properties should be present in a tagset byexamining different mappings of various current part-of-speech tagsets, lookingat English, German, and Italian corpora. Given the importance ofdistributional information, we present a simple model for evaluating how atagset mapping captures distribution, specifically by utilizing a notion offrames to capture the local context. In addition to an accuracy metriccapturing the internal quality of a tagset, we introduce a way to evaluate theexternal quality of tagset mappings so that we can ensure that the mappingretains linguistically important information from the original tagset. Although most of the mappings we evaluate are motivated by linguistic concerns,we also explore an automatic, bottom-up way to define mappings, to illustratethat better distributional mappings are possible. Comparing our initialevaluations to POS tagging results, we find that more distributional tagsetscan sometimes result in worse accuracy, underscring the need to carefullydefine the properties of a tagset. |
Language |
Grammar and Syntax |
Topics |
Part of speech tagging, Evaluation methodologies, Grammar and Syntax |
Full paper  |
Evaluating Distributional Properties of Tagsets |
Bibtex |
author = {Markus Dickinson and Charles Jochim}, title = {Evaluating Distributional Properties of Tagsets}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |