Understanding what makes written texts sound like they are written by their author has been an unsolved problem for hundreds of years. The attributes of authorship are often clumped together as an attempt to solve the case of an unknown author while the practice of investigating a single attribute by eliminating the effect of all others has been paid little attention. One of the debated attributes is the size of the text segments which authors use to group words together. Texts consist of these segments — sentences — which are of different lengths, the values being distributed in ways that are assumed to be characteristic of the author. Comparing the statistics of paired text samples, we can show that differences in the statistics in fact indicate difference in the authorship of the texts. However, certain choices of metrics and units easily lead to random and meaningless results.
CITATION STYLE
Lehtonen, M. (2015). On sentence length distribution as an authorship attribute. Lecture Notes in Electrical Engineering, 339, 811–818. https://doi.org/10.1007/978-3-662-46578-3_96
Mendeley helps you to discover research relevant for your work.