A support vector machine is trained to classify the Five Factor personality of writers of free text.Writers are classified for each of the five personality dimensions as high/low with the mean personality score for each dimension used for the dividing point. Writers are also separately classified as high/medium/low with division points at one standard deviation above and below mean. The two-class average accuracy using 5-fold cross validation of 80.6% is much better than the baseline (pick most likely class) accuracy of 50%, but the 3-class accuracy is only slightly better (7.4%) than baseline because most writers fall into the medium class due to the normal distribution of personality values. Features include bag of words, essay length, word sentiment, negation count and part-of-speech n-grams. The consistently positive contribution of POS n-grams (averaging 4.8% and 5.8% for the 2/3 class cases) is analyzed in detail. The information gain for the most predictive features for each of the five personality dimensions are presented and discussed.
CITATION STYLE
Wright, W. R., & Chin, D. N. (2014). Personality profiling from text: Introducing part-of-speech N-grams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8538, pp. 243–253). Springer Verlag. https://doi.org/10.1007/978-3-319-08786-3_21
Mendeley helps you to discover research relevant for your work.