Controlled and balanced dataset for Japanese lexical simplification

Tomonori Kodaira; Tomoyuki Kajiwara; Mamoru Komachi

Conference ProceedingsOPEN ACCESS

Controlled and balanced dataset for Japanese lexical simplification

54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Student Research Workshop (2016) 1-7

DOI: 10.18653/v1/p16-3001

17Citations

7Readers

Abstract

We propose a new dataset for evaluating a Japanese lexical simplification method. Previous datasets have several deficiencies. All of them substitute only a single target word, and some of them extract sentences only from newswire corpus. In addition, most of these datasets do not allow ties and integrate simplification ranking from all the annotators without considering the quality. In contrast, our dataset has the following advantages: (1) it is the first controlled and balanced dataset for Japanese lexical simplification with high correlation with human judgment and (2) the consistency of the simplification ranking is improved by allowing candidates to have ties and by considering the reliability of annotators.

Cite

CITATION STYLE

APA

Kodaira, T., Kajiwara, T., & Komachi, M. (2016). Controlled and balanced dataset for Japanese lexical simplification. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Student Research Workshop (pp. 1–7). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-3001

Controlled and balanced dataset for Japanese lexical simplification

Abstract

Cite

Register to see more suggestions