We present a new release of the Czech-English parallel corpus CzEng. CzEng 1.6 consists of about 0.5 billion words (“gigaword”) in each language. The corpus is equipped with automatic annotation at a deep syntactic level of representation and alternatively in Universal Dependencies. Additionally, we release the complete annotation pipeline as a virtual machine in the Docker virtualization toolkit.
CITATION STYLE
Bojar, O., Dušek, O., Kocmi, T., Libovický, J., Novák, M., Popel, M., … Variš, D. (2016). CzEng 1.6: Enlarged Czech-English parallel corpus with processing tools dockered. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9924 LNCS, pp. 231–238). Springer Verlag. https://doi.org/10.1007/978-3-319-45510-5_27
Mendeley helps you to discover research relevant for your work.