TEP: Tehran English-Persian parallel corpus

29Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Parallel corpora are one of the key resources in natural language processing. In spite of their importance in many multi-lingual applications, no large-scale English-Persian corpus has been made available so far, given the difficulties in its creation and the intensive labors required. In this paper, the construction process of Tehran English-Persian parallel corpus (TEP) using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment are addressed. To the best of our knowledge, TEP has been the first freely released large-scale (in order of million words) English-Persian parallel corpus. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Pilevar, M. T., Faili, H., & Pilevar, A. H. (2011). TEP: Tehran English-Persian parallel corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6609 LNCS, pp. 68–79). https://doi.org/10.1007/978-3-642-19437-5_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free