In this paper, we propose an approach to semantic differences detection in texts presented in the form of frequency dictionaries. The original text data has been obtained by collecting records on various online communities. We have implemented a specialized software module that allows us to analyze and download both posts and comments from the social network VK’s open communities. To build our frequency dictionary, we have developed an algorithm that takes into account the peculiarities of the data collected from social networks. In the article, we propose an approach based on the use of methods reducing the dimension of feature spaces to identify keywords based on the analysis of their frequency of usage. The algorithm we present uses the principal component analysis technique. As a result, we have shown that by using the coefficients of the obtained linear transformation, it is possible to estimate the importance of words. With the help of these estimates, we were able to identify not only key words, but also semantic differences in social networks communities. The proposed approach can also be used to form metrics and calculate the social distance between Internet communities.
CITATION STYLE
Rytsarev, I. A., Kozlov, D. D., Kravtsova, N. S., Kupriyanov, A. V., Liseckiy, K. S., Liseckiy, S. K., … Samykina, N. Y. (2018). Application of the principal component analysis to detect semantic differences during the content analysis of social networks. In CEUR Workshop Proceedings (Vol. 2212, pp. 262–269). CEUR-WS. https://doi.org/10.18287/1613-0073-2018-2212-262-269
Mendeley helps you to discover research relevant for your work.