Summary of the paper

Title How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.
Authors Jerid Francom, Amy LaCross and Adam Ussishkin
Abstract In this paper we bring to light a novel intersection between corpus linguisticsand behavioral data that can be employed as an evaluation metric for resourcesfor low-density languages, drawing on well-established psycholinguisticfactors. Using the low-density language Maltese as a test case, we highlightthe challenges that face researchers developing resources for languages withsparsely available data and identify a key empirical link between corpus andpsycholinguistic research as a tool to evaluate corpus resources. Specifically,we compare two robust variables identified in the psycholinguistic literature:word frequency (as measured in a corpus) and word familiarity (as measured in arating task). We then apply statistical methods to evaluate the extent to whichfamiliarity ratings predict corpus frequency for verbs in the Maltese corpusfrom three angles: 1) token frequency, 2) frequency distributions and 3)morpho-syntactic type (binyan). This research provides a multidisciplinaryapproach to corpus development and evaluation, in particular for less-resourcedlanguages that lack a wide access to diverse language data.
Language Corpus (creation, annotation, etc.)
Topics Validation of LRs, Cognitive methods, Corpus (creation, annotation, etc.)
Full paper How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.
Bibtex @InProceedings{FRANCOM10.666,
  author = {Jerid Francom, Amy LaCross and Adam Ussishkin},
  title = {How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA