Title |
Online Japanese Unknown Morpheme Detection using Orthographic Variation |
Authors |
Yugo Murawaki and Sadao Kurohashi |
Abstract |
To solve the unknown morpheme problem in Japanese morphologicalanalysis, we previously proposed a novel framework of online unknownmorpheme acquisition and its implementation.This framework poses a previously unexplored problem, online unknownmorpheme detection.Online unknown morpheme detection is a task of finding morphemes in eachsentence that are not listed in a given lexicon.Unlike in English, it is a non-trivial task because Japanese does notdelimit words by white space.We first present a baseline method that simply uses the output ofthe morphological analyzer.We then show that it fails to detect some unknown morphemes because theyare over-segmented into shorter registered morphemes.To cope with this problem, we present a simple solution, the use oforthographic variation of Japanese.Under the assumption that orthographic variants behave similarly, eachover-segmentation candidate is checked against its counterparts.Experiments show that the proposed method improves the recall ofdetection and contributes to improving unknown morpheme acquisition. |
Language |
Tools, systems, applications |
Topics |
Acquisition, Lexicon, lexical database, Tools, systems, applications |
Full paper  |
Online Japanese Unknown Morpheme Detection using Orthographic Variation |
Bibtex |
@InProceedings{MURAWAKI10.171,
author = {Yugo Murawaki and Sadao Kurohashi}, title = {Online Japanese Unknown Morpheme Detection using Orthographic Variation}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |