This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. We describe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.
CITATION STYLE
Azpeitia, A., Etchegoyhen, T., & Garcia, E. M. (2017). Weighted set-theoretic alignment of comparable sentences. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 41–45). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-2508
Mendeley helps you to discover research relevant for your work.