Social scientists increasingly use (big) social media data to illuminate long-standing substantive questions in social science research. However, a key challenge of analyzing such data is their lower level of individual detail compared to highly detailed survey data. This limits the scope of substantive questions that can be addressed with these data. In this study, we provide a method to upgrade individual detail in terms of ethnicity in data gathered from social media via the use of register data. Our research aim is twofold: first, we predict the most likely value of ethnicity, given one's first name, and second, we show how one can test hypotheses with the predicted values for ethnicity as an independent variable while simultaneously accounting for the uncertainty in these predictions. We apply our method to social network data collected from Facebook. We illustrate our approach and provide an example of hypothesis testing using our procedure, i.e., estimating the relation between predicted network ethnic homogeneity on Facebook and trust in institutions. In a comparison of our method with two other methods, we find that our method provides the most conservative tests of hypotheses. We discuss the promise of our approach and pinpoint future research directions.
CITATION STYLE
Hofstra, B., & de Schipper, N. C. (2018). Predicting ethnicity with first names in online social media networks. Big Data and Society, 5(1). https://doi.org/10.1177/2053951718761141
Mendeley helps you to discover research relevant for your work.