Summary of the paper

Title Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica
Authors Helena Blancafort
Abstract In this paper we present preliminary work conducted on semi-automatic inductionof inflectional paradigms from non annotated corpora using the open-source toolLinguistica (Goldsmith 2001) that can be utilized without any prior knowledgeof the language. The aim is to induce morphology information from corpora suchas to compare languages and foresee the difficulty to develop morphosyntacticlexica. We report on a series of corpus-based experiments run with Linguisticain Romance languages (Catalan, French, Italian, Portuguese, and Spanish),Germanic languages (Dutch, English and German), and Slavic language Polish. Foreach language we obtained interesting clusters of stems sharing the samesuffixes. They can be seen as mini inflectional paradigms that includeproductive derivative suffixes. We ranked results depending on the size of theparadigms (maximum number of suffixes per stem) per language. Results show thatit is useful to get a first idea of the role and complexity of inflection andderivation in a language, to compare results with other languages, and that itcould be useful to build lexicographic resources from scratch. Still, specialpost-processing is needed to face the two principal drawbacks of the tool: noclear distinction between inflection and derivation, and not taking allomorphyinto account.
Language Lexicon, lexical database
Topics Multilinguality, Morphology, Lexicon, lexical database
Full paper Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica
Bibtex @InProceedings{BLANCAFORT10.872,
  author = {Helena Blancafort},
  title = {Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA