We present a method for detection of the alignment errors in parallel corpora. The method is meant to be language-independent and was tested for pairs of English, Polish and Spanish languages. It utilizes automatically obtained dictionaries to perform the detection. A discussion about the origin of errors is included. An approach to correcting one of classes of errors is also described and tested. The proposed method has proven itself to be effective in improving the quality of Parallel Corpora. Conclusions of this study may be useful while dealing with errors in existing parallel data sources, as well as at the stage of aligning new parallel corpora.
CITATION STYLE
Niżałowska, K., & Markowska-Kaczmar, U. (2015). A language-independent method for detection and correction of alignment errors in parallel corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9103, pp. 335–346). Springer Verlag. https://doi.org/10.1007/978-3-319-19581-0_30
Mendeley helps you to discover research relevant for your work.