Personality profiling from text: Introducing part-of-speech N-grams

25Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A support vector machine is trained to classify the Five Factor personality of writers of free text.Writers are classified for each of the five personality dimensions as high/low with the mean personality score for each dimension used for the dividing point. Writers are also separately classified as high/medium/low with division points at one standard deviation above and below mean. The two-class average accuracy using 5-fold cross validation of 80.6% is much better than the baseline (pick most likely class) accuracy of 50%, but the 3-class accuracy is only slightly better (7.4%) than baseline because most writers fall into the medium class due to the normal distribution of personality values. Features include bag of words, essay length, word sentiment, negation count and part-of-speech n-grams. The consistently positive contribution of POS n-grams (averaging 4.8% and 5.8% for the 2/3 class cases) is analyzed in detail. The information gain for the most predictive features for each of the five personality dimensions are presented and discussed.

Cite

CITATION STYLE

APA

Wright, W. R., & Chin, D. N. (2014). Personality profiling from text: Introducing part-of-speech N-grams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8538, pp. 243–253). Springer Verlag. https://doi.org/10.1007/978-3-319-08786-3_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free