LREC 2010 Proceedings

Summary of the paper

Title	A Japanese Particle Corpus Built by Example-Based Annotation
Authors	Hiroki Hanaoka, Hideki Mima and Jun'ichi Tsujii
Abstract	This paper is a report on an on-going project of creating a new corpus focusingon Japanese particles. The corpus will provide deeper syntactic/semanticinformation than the existing resources. The initial target particle is ``to''which occurs 22,006 times in 38,400 sentences of the existing corpus: the KyotoText Corpus. In this annotation task, an ``example-based'' methodology isadopted for the corpus annotation, which is different from the traditionalannotation style. This approach provides the annotators with an examplesentence rather than a linguistic category label. By avoiding linguistictechnical terms, it is expected that any native speakers, with no specialknowledge on linguistic analysis, can be an annotator without long training,and hence it can reduce the annotation cost. So far, 10,475 occurrences havebeen already annotated, with an inter-annotator agreement of 0.66 calculated byCohen's kappa. The initial disagreement analyses and future directions arediscussed in the paper.
Language	Other
Topics	Corpus (creation, annotation, etc.), Grammar and Syntax, Other
Full paper	A Japanese Particle Corpus Built by Example-Based Annotation
Bibtex	@InProceedings{HANAOKA10.617, author = {Hiroki Hanaoka, Hideki Mima and Jun'ichi Tsujii}, title = {A Japanese Particle Corpus Built by Example-Based Annotation}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }