Readability factors of japanese text classification

Lukáš Pichl; Joe Narita

Conference Proceedings

Readability factors of japanese text classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4777 LNCS 132-138

DOI: 10.1007/978-3-540-75512-8_10

1Citations

6Readers

Get full text

Abstract

Languages with comprehensive alphabets in written form, such as the ideographic system of Chinese adopted to Japanese, have specific combinatorial potential for text summarization and categorization. Modern Japanese text is composed of strings over the Roman alphabet, components of two phonetic systems, Japanese syllabaries hiragana and katakana, and Chinese characters. This richness of information expression facilitates, unlike from most other languages, creation of synonyms and paraphrases, which may but do not need to be contextwise substantiable, depending not only on circumstance but also on the user of the text. Therefore readability of Japanese text is largely individual; it depends on education and incorporates life-long experience. This work presents a quantitative study into common readability factors of Japanese text, for which thirteen text markers were developed. Our statistical analysis expressed as a numerical readability index is accompanied by categorization of text contents, which is visualized as a specific location on self-organizing map over a reference text corpus. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Pichl, L., & Narita, J. (2007). Readability factors of japanese text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4777 LNCS, pp. 132–138). Springer Verlag. https://doi.org/10.1007/978-3-540-75512-8_10

Readability factors of japanese text classification

Abstract

Cite

Register to see more suggestions