Personality profiling from text: Introducing part-of-speech N-grams

William R. Wright; David N. Chin

Conference Proceedings

Personality profiling from text: Introducing part-of-speech N-grams

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8538 243-253

DOI: 10.1007/978-3-319-08786-3_21

25Citations

26Readers

Get full text

Abstract

A support vector machine is trained to classify the Five Factor personality of writers of free text.Writers are classified for each of the five personality dimensions as high/low with the mean personality score for each dimension used for the dividing point. Writers are also separately classified as high/medium/low with division points at one standard deviation above and below mean. The two-class average accuracy using 5-fold cross validation of 80.6% is much better than the baseline (pick most likely class) accuracy of 50%, but the 3-class accuracy is only slightly better (7.4%) than baseline because most writers fall into the medium class due to the normal distribution of personality values. Features include bag of words, essay length, word sentiment, negation count and part-of-speech n-grams. The consistently positive contribution of POS n-grams (averaging 4.8% and 5.8% for the 2/3 class cases) is analyzed in detail. The information gain for the most predictive features for each of the five personality dimensions are presented and discussed.

Author supplied keywords

Cite

CITATION STYLE

APA

Wright, W. R., & Chin, D. N. (2014). Personality profiling from text: Introducing part-of-speech N-grams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8538, pp. 243–253). Springer Verlag. https://doi.org/10.1007/978-3-319-08786-3_21

Personality profiling from text: Introducing part-of-speech N-grams

Abstract

Author supplied keywords

Cite

Register to see more suggestions