Abstract
We participated in the Bilingual Document Alignment shared task of WMT 2016 with the intent of testing plain cross-lingual information retrieval platform built on top of the Apache Lucene framework. We devised a number of interesting variants, including one that only considers the URLs of the pages, and that offers - without any heuristic - surprisingly high performances. We finally submitted the output of a system that combines two informations (text and url) from documents and a post-treatment for an accuracy that reaches 92% on the development dataset distributed for the shared task.
Cite
CITATION STYLE
Jakubina, L., & Langlais, P. (2016). B1A3D2LUC@WMT 2016: A Bilingual1Document2Alignment3Platform Based on Lucene. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 703–709). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2370
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.