It is well known that multi-word expressions are problematic in natural language processing. In previous literature, it has been suggested that information about their degree of compositionality can be helpful in various applications but it has not been proven empirically. In this paper, we propose a framework in which information about the multi-word expressions can be used in the word-alignment task. We have shown that even simple features like point-wise mutual information are useful for word-alignment task in English-Hindi parallel corpora. The alignment error rate which we achieve (AER = 0.5040) is significantly better (about 10% decrease in AER) than the alignment error rates of the state-of-art models (Och and Ney, 2003) (Best AER = 0.5518) on the English-Hindi dataset.
CITATION STYLE
Venkatapathy, S., & Joshi, A. K. (2006). Using Information about Multi-word Expressions for the Word-Alignment Task. In COLING ACL 2006 - Multiword Expressions: Identifying and Exploiting Underlying Properties, Proceedings of the Workshop (pp. 20–27). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1613692.1613697
Mendeley helps you to discover research relevant for your work.