Summary of the paper

Title Automatic Acquisition of Chinese Novel Noun Compounds
Authors Meng Wang, Chu-Ren Huang, Shiwen Yu and Weiwei Sun
Abstract Automatic acquisition of novel compounds is notoriously difficult because mostnovel compounds have relatively low frequency in a corpus. The current studyproposes a new method to deal with the novel compound acquisition challenge. Wemodel this task as a two-class classification problem in which a candidatecompound is either classified as a compound or a non-compound. A machinelearning method using SVM, incorporating two types of linguistically motivatedfeatures: semantic features and character features, is applied to identify rarebut valid noun compounds. We explore two kinds of training data: one is virtualtraining data which is obtained by three statistical scores, i.e. co-occurrencefrequency, mutual information and dependent ratio, from the frequent compounds;the other is real training data which is randomly selected from the infrequentcompounds. We conduct comparative experiments, and the experimental resultsshow that even with limited direct evidence in the corpus for the novelcompounds, we can make full use of the typical frequent compounds to help inthe discovery of the novel compounds.
Language Lexicon, lexical database
Topics MultiWord Expressions & Collocations, Acquisition, Lexicon, lexical database
Full paper Automatic Acquisition of Chinese Novel Noun Compounds
Bibtex @InProceedings{WANG10.377,
  author = {Meng Wang, Chu-Ren Huang, Shiwen Yu and Weiwei Sun},
  title = {Automatic Acquisition of Chinese Novel Noun Compounds},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA