Writers tend to express their ideas with different styles, defined with the so called firm or stylome, which is an abstraction of the general constraints and specific combinations of words within their language they decide to follow. Although capturing this style has proven to be very difficult, some advances have been achieved. Here, we present a novel system that is trained with texts from the same author, and is able to unveil some of its features, and to apply them to detect texts not written by the same author, or, at least, not written with the previously learned features. The system is an hybrid model based in self-organizing maps and in information-theoretic aspects. In the model, mutual information function of unknown texts are compared to the mutual information function of texts from a known author. If the distance between these two distributions exceeds a certain threshold, then the unknown text is from a different author, otherwise the authorship is the same. The decision threshold is obtained by the self-organizing map trained with the texts from the same author. We present results in authorship identification in several contexts including classic literature, journalism (political, economical, sports), and scientific divulgation. © 2010 Springer-Verlag.
CITATION STYLE
Neme, A., Lugo, B., & Cervera, A. (2010). Detection of different authorship of text sequences through self-organizing maps and mutual information function. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6438 LNAI, pp. 186–195). https://doi.org/10.1007/978-3-642-16773-7_16
Mendeley helps you to discover research relevant for your work.