Demographics are widely used in marketing to characterize different types of customers. However, in practice, demographic information such as age, gender, and location is usually unavailable due to privacy and other reasons. In this paper, we aim to harness the power of big data to automatically infer users’ demographics based on their daily mobile communication patterns. Our study is based on a real-world large mobile network of more than 7,000,000 users and over 1,000,000,000 communication records (CALL and SMS). We discover several interesting social strategies that mobile users frequently use to maintain their social connections. First, young people are very active in broadening their social circles, while seniors tend to keep close but more stable connections. Second, female users put more attention on cross-generation interactions than male users, though interactions between male and female users are frequent. Third, a persistent same-gender triadic pattern over one’s lifetime is discovered for the first time, while more complex opposite-gender triadic patterns are only exhibited among young people. We further study to what extent users’ demographics can be inferred from their mobile communications. As a special case, we formalize a problem of double dependent-variable prediction— inferring user gender and age simultaneously. We propose the WhoAmI method, a Double Dependent-Variable Factor Graph Model, to address this problem by considering not only the effects of features on gender/age, but also the interrelation between gender and age. Our experiments show that the proposed WhoAmI method significantly improves the prediction accuracy by up to 10% compared with several alternative methods.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below