Summary of the paper

Title A Corpus for Evaluating Semantic Multilingual Web Retrieval Systems: The Sense Folder Corpus
Authors Ernesto William De Luca
Abstract In this paper, we present the multilingual Sense Folder Corpus. After theanalysis of different corpora, we describe the requirements that have to besatisfied for evaluating semantic multilingual retrieval approaches. Justifiedby the unfulfilled requirements explained, we start creating a small bilingualhand-tagged corpus of 502 documents retrieved from Web searches. The documentscontained in this collection have been created using Google queries. A singleambiguous word has been searched and related documents (approx. the first 60documents for every keyword) have been retrieved. The document collection hasbeen extended at the query word level, using single ambiguous words for English(argument, bank, chair, network and rule) and for Italian (argomento, lingua,regola, rete and stampa). The search and annotation process has been done bothin a monolingual way for the English and the Italian language. 252 English and250 Italian documents have been retrieved from Google and saved in theiroriginal rank. The performance of semantic multilingual retrieval systems hasbeen evaluated using such a corpus with three baselines (“Random”, “FirstSense” and “Most Frequent Sense”) that are formally presented anddiscussed. The fine-grained evaluation of the Sense Folder approach isdiscussed in details.
Language Information Extraction, Information Retrieval
Topics Corpus (creation, annotation, etc.), Document Classification, Text categorisation, Information Extraction, Information Retrieval
Full paper A Corpus for Evaluating Semantic Multilingual Web Retrieval Systems: The Sense Folder Corpus
Bibtex @InProceedings{DELUCA10.816,
  author = {Ernesto William De Luca},
  title = {A Corpus for Evaluating Semantic Multilingual Web Retrieval Systems: The Sense Folder Corpus},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA