The objective of natural language understanding is to exploit the rich resources like text corpora for semantic categorization of texts. In natural language understanding corpus based statistical approaches are being used for language modeling and translation modeling. In this paper we applied the sentence pre- processing using factored base translation models on Europarl dataset and results show that pre-processing reduces the number of out of the vocabulary words accurately. This paper also defines methodology for preprocessing the parallel dataset using factored based model from Europarl dataset which can be used in machine translation ahead.
CITATION STYLE
Jolly, S. K., & Agrawal, R. (2019). A broad coverage of corpus for understanding translation divergences. International Journal of Innovative Technology and Exploring Engineering, 8(8 Special Issue2), 613–618.
Mendeley helps you to discover research relevant for your work.