Summary of the paper

Title Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches
Authors Ting-Hao Huang, Lun-Wei Ku and Hsin-Hsi Chen
Abstract This paper presented an overview of Chinese bi-character words’ morphologicaltypes, and proposed a set of features for machine learning approaches topredict these types based on composite characters’ information. First, eightmorphological types were defined, and 6,500 Chinese bi-character words wereannotated with these types. After pre-processing, 6,178 words were selected toconstruct a corpus named Reduced Set. We analyzed Reduced Set and conducted theinter-annotator agreement test. The average kappa value of 0.67 indicates asubstantial agreement. Second, Bi-character words’ morphological types areconsidered strongly related with the composite characters’ parts of speech inthis paper, so we proposed a set of features which can simply be extracted fromdictionaries to indicate the characters’ “tendency” of parts of speech.Finally, we used these features and adopted three machine learning algorithms,SVM, CRF, and Naïve Bayes, to predict the morphological types. On the average,the best algorithm CRF achieved 75% of the annotators’ performance.
Language Lexicon, lexical database
Topics Morphology, Corpus (creation, annotation, etc.), Lexicon, lexical database
Full paper Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches
Bibtex @InProceedings{HUANG10.397,
  author = {Ting-Hao Huang, Lun-Wei Ku and Hsin-Hsi Chen},
  title = {Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA