Abstract
Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes including training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that it is feasible to build such a resource by collecting and analysing a pilot corpus, which is made publicly available and can be used to construct a large comparable corpus.
Cite
CITATION STYLE
Milajevs, D. (2017). Toward a comparable corpus of Latvian, Russian and English tweets. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 26–30). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-2505
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.