Machine Learning Models

  • Savoy J
N/ACitations
Citations of this article
33Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As many stylometric applications must first learn and represent the distinctive style of different categories or authors, several machine learning algorithms have been suggested to solve the authorship attribution or profiling issues. This sixth chapter presents four important models. Based on vector space representation, the k-nearest neighbors (k-NN) model is based on a distance (or similarity) measure computed between the doubtful text and either the different categories (profile-based) or all texts (instances-based). The closest or the k closest instances are then used to define the proposed decision. With the naïve Bayes model, probability theory is used to estimate the occurrence of each selected stylistic marker according to the different categories. Given the query text, the model computes the probability of each class to determine the most probable one. A more complex approach, the support vector machine (SVM) defines a linear border splitting the training set into two distinct regions, one for each category. Based on this representation, the doubtful text is projected into this space and its position defines its attribution. Finally, logistic regression is described as an approach to estimate the probability of a query text belonging to a given class. As the practical aspect is important to obtain a clear understanding of all these methods, examples, written in R, are provided using usually the Federalist Papers as a testbed corpus.

Cite

CITATION STYLE

APA

Savoy, J. (2020). Machine Learning Models. In Machine Learning Methods for Stylometry (pp. 109–151). Springer International Publishing. https://doi.org/10.1007/978-3-030-53360-1_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free