Weighted set-theoretic alignment of comparable sentences

23Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. We describe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.

References Powered by Scopus

A systematic comparison of various statistical alignment models

2939Citations
N/AReaders
Get full text

Improving machine translation performance by exploiting non-parallel corpora

310Citations
N/AReaders
Get full text

Automatic term extraction using log-likelihood based comparison with general reference corpus

38Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

678Citations
N/AReaders
Get full text

CCMatrix: Mining billions of high-quality parallel sentences on the web

117Citations
N/AReaders
Get full text

Overview of the second BUCC shared task: Spotting parallel sentences in comparable corpora

51Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Azpeitia, A., Etchegoyhen, T., & Garcia, E. M. (2017). Weighted set-theoretic alignment of comparable sentences. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 41–45). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-2508

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 22

69%

Researcher 7

22%

Lecturer / Post doc 2

6%

Professor / Associate Prof. 1

3%

Readers' Discipline

Tooltip

Computer Science 28

74%

Linguistics 5

13%

Engineering 3

8%

Social Sciences 2

5%

Save time finding and organizing research with Mendeley

Sign up for free