Author clustering with an adaptive threshold

Mirco Kocher; Jacques Savoy

Conference Proceedings

Author clustering with an adaptive threshold

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10456 LNCS 186-198

DOI: 10.1007/978-3-319-65813-1_19

0Citations

2Readers

Get full text

Abstract

This paper describes and evaluates an unsupervised author clustering model called Spatium. The proposed strategy can be adapted without any difficulty to different natural languages (such as Dutch, English, and Greek) and it can be applied to different text genres (newspaper articles, reviews, excerpts of novels, etc.). As features, we suggest using the m most frequent terms of each text (isolated words and punctuation symbols with m set to at most 200). Applying a distance measure, we define whether there is enough evidence that two texts were written by the same author. The evaluations are based on six test collections (PAN Author Clustering task at CLEF 2016). A more detailed analysis shows the strengths of our approach but also indicates the problems and provides reasons for some of the potential failures of the Spatium model.

Author supplied keywords

Cite

CITATION STYLE

APA

Kocher, M., & Savoy, J. (2017). Author clustering with an adaptive threshold. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10456 LNCS, pp. 186–198). Springer Verlag. https://doi.org/10.1007/978-3-319-65813-1_19

Author clustering with an adaptive threshold

Abstract

Author supplied keywords

Cite

Register to see more suggestions