In this paper we present an authorship attribution method based on the use of complete (non-continuous, with bifurcations) syntactic n-grams as style markers. Syntactic n-grams are obtained by following paths in subtrees of a syntactic tree. We work with relatively short text fragments and build authors’ profiles of various sizes using tf-idf scheme. We train SVM classifier to perform the task. We compare the method with the application of character n-grams and show that the accuracy increases when using complete syntactic n-grams.
CITATION STYLE
Posadas-Duran, J. P., Sidorov, G., & Batyrshin, I. (2014). Complete syntactic N-grams as style markers for authorship attribution. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8856, 9–17. https://doi.org/10.1007/978-3-319-13647-9_2
Mendeley helps you to discover research relevant for your work.