Title |
Feasibility of Automatically Bootstrapping a Persian WordNet |
Authors |
Chris Irwin Davis and Dan Moldovan |
Abstract |
In this paper we describe a proof-of-concept for the bootstrapping of a PersianWordNet. This effort was motivated by previous work done at Stanford Universityon bootstrapping an Arabic WordNet using a parallel corpus and an EnglishWordNet. The principle of that work is based on the premise that paradigmaticrelations are by nature deeply semantic, and as such, are likely to remainintact between languages. We performed our task on a Persian-English bilingualcorpus of George Orwells Nineteen Eighty-Four. The corpus was neitheraligned nor sense tagged, so it was necessary that these were undertaken first.A combination of manual and semiautomated methods were used to tag and sentencealign the corpus. Actual mapping of English word senses onto Persian was doneusing automated techniques. Although Persian is written in Arabic script, it isan Indo-European language, while Arabic is a Central Semitic language. Despitetheir linguistic differences, we endeavor to test the applicability of theStanford strategy to our task. |
Language |
Word Sense Disambiguation |
Topics |
Ontologies, Semantics, Word Sense Disambiguation |
Full paper  |
Feasibility of Automatically Bootstrapping a Persian WordNet |
Bibtex |
author = {Chris Irwin Davis and Dan Moldovan}, title = {Feasibility of Automatically Bootstrapping a Persian WordNet}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |