An Expectation Maximization Algorithm for Textual Unit Alignment

  • Radu I
  • Ceauşu A
  • Irimia E
  • 10


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


The paper presents an Expectation Maximization (EM) algorithm for automatic generation of parallel and quasi-parallel data from any degree of comparable corpora ranging from parallel to weakly comparable. Specifically, we address the problem of extracting related textual units (documents, paragraphs or sentences) relying on the hypothesis that, in a given corpus, certain pairs of translation equivalents are better indicators of a correct textual unit correspondence than other pairs of translation equivalents. We evaluate our method on mixed types of bilingual comparable corpora in six language pairs, obtaining state of the art accuracy figures.

Author-supplied keywords

  • EM algorithm
  • alignment

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

There are no full text links


  • Ion Radu

  • Alexandru Ceauşu

  • Elena Irimia

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free