Summary of the paper

Title Indexing Methods for Faster and More Effective Person Name Search
Authors Mark Arehart
Abstract This paper compares several indexing methods for person names extracted fromtext, developed for an information retrieval system with requirements for fastapproximate matching of noisy and multicultural Romanized names. Such matchingalgorithms are computationally expensive and unacceptably slow when usedwithout an indexing or blocking step. The goal is to create a small candidatepool containing all the true matches that can be exhaustively searched by amore effective but slower name comparison method. In addition to dramaticallyfaster search, some of the methods evaluated here led to modest gains ineffectiveness by eliminating false positives. Four indexing techniques usingeither phonetic keys or substrings of name segments, with and without namesegment stopword lists, were combined with three name matching algorithms. On atest set of 700 queries run against 70K noisy and multicultural names, thebest-performing technique took just 2.1% as long as a naive exhaustive searchand increased F1 by 3 points, showing that an appropriate indexing techniquecan increase both speed and effectiveness.
Language Tools, systems, applications
Topics Information Extraction, Information Retrieval, Person Identification, Tools, systems, applications
Full paper Indexing Methods for Faster and More Effective Person Name Search
Bibtex @InProceedings{AREHART10.166,
  author = {Mark Arehart},
  title = {Indexing Methods for Faster and More Effective Person Name Search},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA