There is growing interest in using social networking sites such as Twitter to gather real-time data on the reactions and opinions of a region's population, including locations in the developing world where social media has played an important role in recent events, such as the 2011 Arab Spring. However, many interesting and important opinions and reactions may differ significantly within a given region depending on the demographics of the subpopulation, including such categories as gender and ethnicity. This information may not be explicitly available in user content or metadata, however, and automated methods are required to infer such hidden attributes. In this paper we describe a method to infer the gender of Twitter users from only the content of their tweets. Looking at Twitter users from the West African nation of Nigeria, we applied supervised machine learning using features derived from the content of user tweets to train a classifier. Using unigram features alone, we obtained an accuracy of 80% for predicting gender, suggesting that content alone can be a good predictor of gender. An analysis of the highest weighted features shows some interesting distinctions between men and women both topically and emotionally. We argue that approaches such as the one described here can give us a clearer picture of who is utilizing social media when certain user attributes are unreliable or not available Copyright © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
CITATION STYLE
Fink, C., Kopecky, J., & Morawski, M. (2012). Inferring gender from the content of tweets: A region specific example. In ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (pp. 459–462). https://doi.org/10.1609/icwsm.v6i1.14320
Mendeley helps you to discover research relevant for your work.