Document alignment techniques based on multilingual sentence representations have recently shown state of the art results. However, these techniques rely on unsupervised distance measurement techniques, which cannot be fined-tuned to the task at hand. In this paper, instead of these unsupervised distance measurement techniques, we employ Metric Learning to derive task-specific distance measurements. These measurements are supervised, meaning that the distance measurement metric is trained using a parallel dataset. Using a dataset belonging to English, Sinhala, and Tamil, which belong to three different language families, we show that these task-specific supervised distance learning metrics outperform their unsupervised counterparts, for document alignment.
CITATION STYLE
Rajitha, C., Piyarathne, L., Sachintha, D., & Ranathunga, S. (2021). Metric Learning in Multilingual Sentence Similarity Measurement for Document Alignment. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 1150–1157). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_129
Mendeley helps you to discover research relevant for your work.