In this paper we propose a framework to improve word segmentation accuracy using input method logs. An input method is software used to type sentences in languages which have far more characters than the number of keys on a keyboard. The main contributions of this paper are: 1) an input method server that proposes word candidates which are not included in the vocabulary, 2) a publicly usable input method that logs user behavior (like typing and selection of word candidates), and 3) a method for improving word segmentation by using these logs. We conducted word segmentation experiments on tweets from Twitter, and showed that our method improves accuracy in this domain. Our method itself is domain-independent and only needs logs from the target domain.
CITATION STYLE
Takahashi, F., & Mori, S. (2015). Keyboard logs as natural annotations for word segmentation. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1186–1196). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1140
Mendeley helps you to discover research relevant for your work.