Abstract
This paper addresses the alignment issue in the framework of exploitation of large bi-multilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work concern the testing of the scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters.
Cite
CITATION STYLE
Papageorgiou, H., Cranias, L., & Piperidis, S. (1994). Automatic alignment in parallel corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1994-June, pp. 334–336). Association for Computational Linguistics (ACL). https://doi.org/10.3115/981732.981784
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.