Summary of the paper

Title Online Japanese Unknown Morpheme Detection using Orthographic Variation
Authors Yugo Murawaki and Sadao Kurohashi
Abstract To solve the unknown morpheme problem in Japanese morphologicalanalysis, we previously proposed a novel framework of online unknownmorpheme acquisition and its implementation.This framework poses a previously unexplored problem, online unknownmorpheme detection.Online unknown morpheme detection is a task of finding morphemes in eachsentence that are not listed in a given lexicon.Unlike in English, it is a non-trivial task because Japanese does notdelimit words by white space.We first present a baseline method that simply uses the output ofthe morphological analyzer.We then show that it fails to detect some unknown morphemes because theyare over-segmented into shorter registered morphemes.To cope with this problem, we present a simple solution, the use oforthographic variation of Japanese.Under the assumption that orthographic variants behave similarly, eachover-segmentation candidate is checked against its counterparts.Experiments show that the proposed method improves the recall ofdetection and contributes to improving unknown morpheme acquisition.
Language Tools, systems, applications
Topics Acquisition, Lexicon, lexical database, Tools, systems, applications
Full paper Online Japanese Unknown Morpheme Detection using Orthographic Variation
Bibtex @InProceedings{MURAWAKI10.171,
  author = {Yugo Murawaki and Sadao Kurohashi},
  title = {Online Japanese Unknown Morpheme Detection using Orthographic Variation},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA