In this paper we outline a method of finding texts in minor languages of Russia in social networks by the example of VKontakte. We find language-specific markers – special tokens that contain letter combinations unique to a certain language and highly frequent in texts in this language. We use Yandex.XML to generate lists of web-pages that contain texts in these languages. We then download data from web-pages in the https://vk.com domain through Vkontakte API.
CITATION STYLE
Krylova, I., Orekhov, B., Stepanova, E., & Zaydelman, L. (2016). Languages of Russia: Using social networks to collect texts. In Communications in Computer and Information Science (Vol. 573, pp. 179–185). Springer Verlag. https://doi.org/10.1007/978-3-319-41718-9_11
Mendeley helps you to discover research relevant for your work.