LREC 2010 Proceedings

Summary of the paper

Title	Collection of Usage Information for Language Resources from Academic Articles
Authors	Shunsuke Kozawa, Hitomi Tohyama, Kiyotaka Uchimoto and Shigeki Matsubara
Abstract	Recently, language resources (LRs) are becoming indispensable for linguisticresearches. However, existing LRs are often not fully utilized because theirvariety of usage is not well known, indicating that their intrinsic value isnot recognized very well either. Regarding this issue, lists of usageinformation might improve LR searches and lead to their efficient use. In thisresearch, therefore, we collect a list of usage information for each LR fromacademic articles to promote the efficient utilization of LRs. This paperproposes to construct a text corpus annotated with usage information (UIcorpus). In particular, we automatically extract sentences containing LR namesfrom academic articles. Then, the extracted sentences are annotated with usageinformation by two annotators in a cascaded manner. We show that the UI corpuscontributes to efficient LR searches by combining the UI corpus with a metadatadatabase of LRs and comparing the number of LRs retrieved with and without theUI corpus.
Language	Other
Topics	Metadata, Corpus (creation, annotation, etc.), Other
Full paper	Collection of Usage Information for Language Resources from Academic Articles
Bibtex	@InProceedings{KOZAWA10.746, author = {Shunsuke Kozawa, Hitomi Tohyama, Kiyotaka Uchimoto and Shigeki Matsubara}, title = {Collection of Usage Information for Language Resources from Academic Articles}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }