We propose a new dataset for evaluating a Japanese lexical simplification method. Previous datasets have several deficiencies. All of them substitute only a single target word, and some of them extract sentences only from newswire corpus. In addition, most of these datasets do not allow ties and integrate simplification ranking from all the annotators without considering the quality. In contrast, our dataset has the following advantages: (1) it is the first controlled and balanced dataset for Japanese lexical simplification with high correlation with human judgment and (2) the consistency of the simplification ranking is improved by allowing candidates to have ties and by considering the reliability of annotators.
CITATION STYLE
Kodaira, T., Kajiwara, T., & Komachi, M. (2016). Controlled and balanced dataset for Japanese lexical simplification. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Student Research Workshop (pp. 1–7). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-3001
Mendeley helps you to discover research relevant for your work.