An Innovative BERT-Based Readability Model

Hou Chiang Tseng; Hsueh Chih Chen; Kuo En Chang; Yao Ting Sung; Berlin Chen

Conference Proceedings

An Innovative BERT-Based Readability Model

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11937 LNCS 301-308

DOI: 10.1007/978-3-030-35343-8_32

16Citations

4Readers

Get full text

Abstract

Readability is referred to as the degree of difficulty to which an given text (article) can be understood by readers. When readers are reading a text with high readability, they will achieve better comprehension and learning retention. However, it has been a long-standing critical challenge to develop effective readability prediction models that can automatically and accurately assess the readability of a given text. When building readability prediction models for the Chinese language, word segmentation ambiguity is often a knotty problem that will inevitably happen in the pre-processing of texts. In view of this, we present in this paper a novel readability prediction approach for the Chinese language, building on a recently proposed, so-called Bidirectional Encoder Representation from Transformers (BERT) model that can capture both syntactic and semantic information of a text directly from its character-level representation. With the BERT-based readability prediction model that takes consecutive character-level representations as its input, we effectively assess the readability of a given text without the need of performing error-prone word segmentation. We empirically evaluate the performance of our BERT-based readability prediction model on a benchmark task, by comparing it with a strong baseline that utilizes a celebrated classification model (named fastText) in conjunction with word-level presentations. The results demonstrate that the BERT-based model with character-level representations can perform on par with the fastText-based model with word-level representations, yielding the accuracy of 78.45% on average. This finding also offers the promise of conducting readability assessment of a text in Chinese directly based on character-level representations.

Author supplied keywords

Cite

CITATION STYLE

APA

Tseng, H. C., Chen, H. C., Chang, K. E., Sung, Y. T., & Chen, B. (2019). An Innovative BERT-Based Readability Model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11937 LNCS, pp. 301–308). Springer. https://doi.org/10.1007/978-3-030-35343-8_32

An Innovative BERT-Based Readability Model

Abstract

Author supplied keywords

Cite

Register to see more suggestions