Title |
Bootstrapping Language Neutral Term Extraction |
Authors |
Wauter Bosma and Piek Vossen |
Abstract |
A variety of methods exist for extracting terms and relations between termsfrom a corpus, each of them having strengths and weaknesses. Rather thanjust using the joint results, we apply different extraction methods in a waythat the results of one method are input to another. This gives us theleverage to find terms and relations that otherwise would not be found. Ourgoal is to create a semantic model of a domain. To that end, we aim to findthe complete terminology of the domain, consisting of terms and relationssuch as hyponymy and meronymy, and connected to generic wordnets andontologies. Terms are ranked by domain-relevance only as a final step, afterterminology extraction is completed. Because term relations are a large partof the semantics of a term, we estimate the relevance from its relation toother terms, in addition to occurrence and document frequencies. In the KYOTOproject, we apply language-neutral terminology extraction froma parsed corpus for seven languages. |
Language |
Multilinguality |
Topics |
Lexicon, lexical database, MultiWord Expressions & Collocations, Multilinguality |
Full paper  |
Bootstrapping Language Neutral Term Extraction |
Bibtex |
@InProceedings{BOSMA10.902,
author = {Wauter Bosma and Piek Vossen}, title = {Bootstrapping Language Neutral Term Extraction}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |