LREC 2010 Proceedings

Summary of the paper

Title	Computer Assisted Semantic Annotation in the DutchSemCor Project
Authors	Attila Görög and Piek Vossen
Abstract	The goal of this paper is to describe the annotation protocols and the SemanticAnnotation Tool (SAT) used in the DutchSemCor project. The DutchSemCor projectis aiming at aligning the Cornetto lexical database with the Dutch languagecorpus SoNaR. 250K corpus occurrences of the 3,000 most frequent and mostambiguous Dutch nouns, adjectives and verbs are being annotated manually usingthe SAT. This data is then used for bootstrapping 750K extra occurrences whichin turn will be checked manually. Our main focus in this paper is themethodology applied in the project to attain the envisaged Inter-annotatorAgreement (IA) of =80%. We will also discuss one of the main objectives ofDutchSemCor i.e. to provide semantically annotated language data with highscores for quantity, quality and diversity. Sample data with high scores forthese three features can yield better results for co-training WSD systems.Finally, we will take a brief look at our annotation tool.
Language	Tools, systems, applications
Topics	Corpus (creation, annotation, etc.), Word Sense Disambiguation, Tools, systems, applications
Full paper	Computer Assisted Semantic Annotation in the DutchSemCor Project
Bibtex	@InProceedings{GRG10.269, author = {Attila Görög and Piek Vossen}, title = {Computer Assisted Semantic Annotation in the DutchSemCor Project}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }