A Practical Approach to Language Complexity: A Wikipedia Case Study

50Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.

Abstract

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity. © 2012 Yasseri et al.

References Powered by Scopus

Readability standards for informed-consent forms as compared with actual readability

592Citations
N/AReaders
Get full text

Wikipedia-based semantic interpretation for natural language processing

310Citations
N/AReaders
Get full text

The fog index after twenty years

288Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Quantifying wikipedia usage patterns before stock market moves

235Citations
N/AReaders
Get full text

Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data

205Citations
N/AReaders
Get full text

Data-driven sentence simplification: Survey and benchmark

101Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Yasseri, T., Kornai, A., & Kertész, J. (2012). A Practical Approach to Language Complexity: A Wikipedia Case Study. PLoS ONE, 7(11). https://doi.org/10.1371/journal.pone.0048386

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 35

57%

Researcher 15

25%

Professor / Associate Prof. 6

10%

Lecturer / Post doc 5

8%

Readers' Discipline

Tooltip

Computer Science 20

40%

Linguistics 13

26%

Physics and Astronomy 10

20%

Social Sciences 7

14%

Article Metrics

Tooltip
Mentions
References: 1

Save time finding and organizing research with Mendeley

Sign up for free