Title |
Collection of Usage Information for Language Resources from Academic Articles |
Authors |
Shunsuke Kozawa, Hitomi Tohyama, Kiyotaka Uchimoto and Shigeki Matsubara |
Abstract |
Recently, language resources (LRs) are becoming indispensable for linguisticresearches. However, existing LRs are often not fully utilized because theirvariety of usage is not well known, indicating that their intrinsic value isnot recognized very well either. Regarding this issue, lists of usageinformation might improve LR searches and lead to their efficient use. In thisresearch, therefore, we collect a list of usage information for each LR fromacademic articles to promote the efficient utilization of LRs. This paperproposes to construct a text corpus annotated with usage information (UIcorpus). In particular, we automatically extract sentences containing LR namesfrom academic articles. Then, the extracted sentences are annotated with usageinformation by two annotators in a cascaded manner. We show that the UI corpuscontributes to efficient LR searches by combining the UI corpus with a metadatadatabase of LRs and comparing the number of LRs retrieved with and without theUI corpus. |
Language |
Other |
Topics |
Metadata, Corpus (creation, annotation, etc.), Other |
Full paper  |
Collection of Usage Information for Language Resources from Academic Articles |
Bibtex |
@InProceedings{KOZAWA10.746,
author = {Shunsuke Kozawa, Hitomi Tohyama, Kiyotaka Uchimoto and Shigeki Matsubara}, title = {Collection of Usage Information for Language Resources from Academic Articles}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |