Summary of the paper

Title Czech Information Retrieval with Syntax-based Language Models
Authors Jana Straková and Pavel Pecina
Abstract In recent years, considerable attention has been dedicated to language modelingmethods in information retrieval. Although these approaches generally allowexploitation of any type of language model, most of the published experimentswere conducted with a classical n-gram model, usually limited only to unigrams.A few works exploiting syntax in information retrieval can be cited in thiscontext, but significant contribution of syntax based language modeling forinformation retrieval is yet to be proved.In this paper, we propose, implement, and evaluate an enrichment of languagemodel employing syntactic dependency information acquired automatically fromboth documents and queries. Our experiments are conducted on Czech which is amorphologically rich language and has a considerably free word order, thereforea syntactic language model is expected to contribute positively to the unigramand bigram language model based on surface word order.By testing our model on the Czech test collection from Cross LanguageEvaluation Forum 2007 Ad-Hoc track, we show positive contribution of usingdependency syntax in this context.
Language Grammar and Syntax
Topics Information Extraction, Information Retrieval, Language modelling, Grammar and Syntax
Full paper Czech Information Retrieval with Syntax-based Language Models
Bibtex @InProceedings{STRAKOV10.215,
  author = {Jana Straková and Pavel Pecina},
  title = {Czech Information Retrieval with Syntax-based Language Models},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA