Boosting statistical word alignment using labeled and unlabeled data

Hua Wu; Haifeng Wang; Zhanyi Liu

Conference Proceedings

Boosting statistical word alignment using labeled and unlabeled data

COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Main Conference Poster Sessions (2006) 913-920

DOI: 10.3115/1273073.1273190

16Citations

88Readers

Get full text

Abstract

This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm to a semi-supervised learning algorithm by incorporating the unlabeled data. In this algorithm, we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data, and calculate the error rate of each word aligner using only the labeled data. Based on this semi-supervised boosting algorithm, we investigate two boosting methods for word alignment. In addition, we improve the word alignment results by combining the results of the two semi-supervised boosting methods. Experimental results on word alignment indicate that semi-supervised boosting achieves relative error reductions of 28.29% and 19.52% as compared with supervised boosting and unsupervised boosting, respectively.

Cite

CITATION STYLE

APA

Wu, H., Wang, H., & Liu, Z. (2006). Boosting statistical word alignment using labeled and unlabeled data. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Main Conference Poster Sessions (pp. 913–920). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1273073.1273190

Boosting statistical word alignment using labeled and unlabeled data

Abstract

Cite

Register to see more suggestions