Summary of the paper

Title Learning Language Identification Models: A Comparative Analysis of the Distinctive Features of Names and Common Words
Authors Stasinos Konstantopoulos
Abstract The intuition and basic hypothesis that this paper explores is that names aremore characteristic of their language than common words are, and that a singlename can have enough clues to confidently identify its language where randomtext of the same length wouldn't. To test this hypothesis, n-gramm modelling isused to learn language models which identify the language of isolated names andequally short document fragments. As the empirical results corroborate theprior intuition, an explanation is sought for the higher accuracy at which thelanguage of names can be identified. The results of the application of thesemodels, as well as the models themselves, are quantitatively and qualitativelyanalysed and a hypothesis is formed about the explanation of this difference.The conclusions derived are both technologically useful in informationextraction or text-to-speech tasks, and theoretically interesting as a tool forimproving our understanding of the morphology and phonology of the languagesinvolved in the experiments.
Language Information Extraction, Information Retrieval
Topics Language Identification, Named Entity recognition, Information Extraction, Information Retrieval
Full paper Learning Language Identification Models: A Comparative Analysis of the Distinctive Features of Names and Common Words
Bibtex @InProceedings{KONSTANTOPOULOS10.452,
  author = {Stasinos Konstantopoulos},
  title = {Learning Language Identification Models: A Comparative Analysis of the Distinctive Features of Names and Common Words},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA