This paper considers various measures which become constant for any large lengths of a given natural language text. Consideration of such measures gives some hints for studies of complexity of natural language. Previously, such measures have been studied mainly for relatively small English texts. In this work, we consider the measures for texts other than English and also for large scale texts. Among the measures, we consider Yule's K, Orlov's Z, and Golcher's VM , which are previously empirically argued their convergence, and in addition, the entropy H, and r, the measure related to the scale-free network. Our experiments show that both K and VM are convergent for texts of various language, whereas the other measures are not.
CITATION STYLE
Kimura, D., & Tanaka-Ishii, K. (2014). Study on Constants of Natural Language Texts. Journal of Natural Language Processing, 21(4), 877–895. https://doi.org/10.5715/jnlp.21.877
Mendeley helps you to discover research relevant for your work.