Summary of the paper

Title Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure
Authors Lukas Michelbacher, Florian Laws, Beate Dorow, Ulrich Heid and Hinrich Schütze
Abstract The Internet is an ever growing source of information stored in documents ofdifferent languages. Hence, cross-lingual resources are needed for more andmore NLP applications. This paper presents (i) a graph-based method forcreating one such resource and (ii) a resource created using the method, across-lingual relatedness thesaurus. Given a word in one language, thethesaurus suggests words in a second language that are semantically related.The method requires two monolingual corpora and a basic dictionary. Ourgeneral approach is to build two monolingual word graphs, with nodesrepresenting words and edges representing linguistic relations between words.A bilingual dictionary containing basic vocabulary provides seed translationsrelating nodes from both graphs. We then use an inter-graph node-similarityalgorithm to discover related words. Evaluation with three human judgesrevealed that 49% of the English and 57% of the German words discovered by ourmethod are semantically related to the target words. We publish two resourcesin conjunction with this paper. First, noun coordinations extracted from theGerman and English Wikipedias. Second, the cross-lingual relatednessthesaurus which can be used in experiments involving interactive cross-lingualquery expansion.
Language Corpus (creation, annotation, etc.)
Topics Information Extraction, Information Retrieval, Lexicon, lexical database, Corpus (creation, annotation, etc.)
Full paper Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure
Bibtex @InProceedings{MICHELBACHER10.499,
  author = {Lukas Michelbacher, Florian Laws, Beate Dorow, Ulrich Heid and Hinrich Schütze},
  title = {Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA