B1A3D2LUC@WMT 2016: A Bilingual1Document2Alignment3Platform Based on Lucene

Laurent Jakubina; Philippe Langlais

Conference ProceedingsOPEN ACCESS

B1A3D2LUC@WMT 2016: A Bilingual1Document2Alignment3Platform Based on Lucene

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2016) 2 703-709

DOI: 10.18653/v1/w16-2370

5Citations

62Readers

Abstract

We participated in the Bilingual Document Alignment shared task of WMT 2016 with the intent of testing plain cross-lingual information retrieval platform built on top of the Apache Lucene framework. We devised a number of interesting variants, including one that only considers the URLs of the pages, and that offers - without any heuristic - surprisingly high performances. We finally submitted the output of a system that combines two informations (text and url) from documents and a post-treatment for an accuracy that reaches 92% on the development dataset distributed for the shared task.

Cite

CITATION STYLE

APA

Jakubina, L., & Langlais, P. (2016). B1A3D2LUC@WMT 2016: A Bilingual1Document2Alignment3Platform Based on Lucene. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 703–709). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2370

B1A3D2LUC@WMT 2016: A Bilingual1Document2Alignment3Platform Based on Lucene

Abstract

Cite

Register to see more suggestions