Summary of the paper

Title An Annotated Dataset for Extracting Definitions and Hypernyms from the Web
Authors Roberto Navigli, Paola Velardi and Juana María Ruiz-Martínez
Abstract This paper presents and analyzes an annotated corpus of definitions, created totrain an algorithm for the automatic extraction of definitions and hypernymsfrom web documents. As an additional resource, we also include a corpus ofnon-definitions with syntactic patterns similar to those of definitionsentences, e.g.: "An android is a robot" vs. "Snowcap is unmistakable". Domain and style independence is obtained thanks to the annotation of a largeand domain-balanced corpus and to a novel pattern generalization algorithmbased on word-class lattices (WCL). A lattice is a directed acyclic graph(DAG), a subclass of nondeterministic finite state automata (NFA). The latticestructure has the purpose of preserving the salient differences among distinctsequences, while eliminating redundant information. The WCL algorithm will beintegrated into an improved version of the GlossExtractor Web application(Velardi et al., 2008). This paper is mostly concerned with a description ofthe corpus, the annotation strategy, and a linguistic analysis of the data. Asummary of the WCL algorithm is also provided for the sake of completeness.
Language Information Extraction, Information Retrieval
Topics Corpus (creation, annotation, etc.), Semantics, Information Extraction, Information Retrieval
Full paper An Annotated Dataset for Extracting Definitions and Hypernyms from the Web
Bibtex @InProceedings{NAVIGLI10.20,
  author = {Roberto Navigli, Paola Velardi and Juana María Ruiz-Martínez},
  title = {An Annotated Dataset for Extracting Definitions and Hypernyms from the Web},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA