Improved algorithm for automatic word alignment for Hindi-Punjabi parallel corpus

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes an alignment system that aligns texts at the word level in Hindi-Punjabi parallel corpus. The previous aligner was based on length based estimation approach. In the previous version, multi-word unit & sometime one-to-one produces alignment errors. In this improved version, different techniques like Boundary Detection, Dictionary-Lookup (DL), Nearest-align-Neighbor (NAN) and Scoring based Minimum distance function to improve the accuracy has been used. Alignment of words means to identify correspondences between words in source language and target language sentences. This automatic word alignment of Hindi-Punjabi corpus is very useful in automatically developing Hindi-Punjabi dictionary. In the previous version, the system accuracy was claimed to be 89.5 % approximately but after rigorous testing, it is found to be 65%. After implementing above techniques in the improved system explained here, system accuracy was found to be 99.09% for one-to-one word alignment and 80% accuracy for multi-word alignment. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Jindal, K., & Goyal, V. (2012). Improved algorithm for automatic word alignment for Hindi-Punjabi parallel corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6411 LNCS, pp. 255–263). https://doi.org/10.1007/978-3-642-27872-3_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free