Abstract
Computational Social Science (CSS), aiming at utilizing computational methods to address social science problems, is a recent emerging and fast-developing field. The study of CSS is data-driven and significantly benefits from the availability of online user-generated contents and social networks, which contain rich text and network data for investigation. However, these large-scale and multi-modal data also present researchers with a great challenge: how to represent data effectively to mine the meanings we want in CSS? To explore the answer, we give a thorough review of data representations in CSS for both text and network. Specifically, we summarize existing representations into two schemes, namely symbol-based and embedding-based representations, and introduce a series of typical methods for each scheme. Afterwards, we present the applications of the above representations based on the investigation of more than 400 research articles from 6 top venues involved with CSS. From the statistics of these applications, we unearth the strength of each kind of representations and discover the tendency that embedding-based representations are emerging and obtaining increasing attention over the last decade. Finally, we discuss several key challenges and open issues for future directions. This survey aims to provide a deeper understanding and more advisable applications of data representations for CSS researchers.
Author supplied keywords
Cite
CITATION STYLE
Chen, H., Yang, C., Zhang, X., Liu, Z., Sun, M., & Jin, J. (2021). From symbols to embeddings: A tale of two representations in computational social science. Journal of Social Computing, 2(2), 103–156. https://doi.org/10.23919/JSC.2021.0011
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.