Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n>1 or m>1) alignment modes which are prone to mismatch. We propose a similarity measure based on Hanzi characters information for these kinds of alignment modes. By using dynamic programming, we combine statistical information and Hanzi character information to find the overall least cost in aligning. Experiments show our algorithm can achieve good alignment accuracy. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Wang, X., & Ren, F. (2005). Chinese-Japanese clause alignment. In Lecture Notes in Computer Science (Vol. 3406, pp. 400–412). Springer Verlag. https://doi.org/10.1007/978-3-540-30586-6_43
Mendeley helps you to discover research relevant for your work.