Toward a comparable corpus of Latvian, Russian and English tweets

Dmitrijs Milajevs

Conference ProceedingsOPEN ACCESS

Toward a comparable corpus of Latvian, Russian and English tweets

Milajevs D

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2017) 26-30

DOI: 10.18653/v1/w17-2505

2Citations

61Readers

Abstract

Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes including training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that it is feasible to build such a resource by collecting and analysing a pilot corpus, which is made publicly available and can be used to construct a large comparable corpus.

Cite

CITATION STYLE

APA

Milajevs, D. (2017). Toward a comparable corpus of Latvian, Russian and English tweets. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 26–30). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-2505

Toward a comparable corpus of Latvian, Russian and English tweets

Abstract

Cite

Register to see more suggestions