When writing formal articles many English writers often use long sentences with few punctuation marks. Since long sentences bring difficulty to machine translation systems, many researchers try to split them using punctuation marks before translation. But dealing with sentences with few punctuation marks is still intractable. In this paper we use a log linear model to insert commas into proper positions to split long sentence, trying to shorten the length of sub-sentence and benefit to machine translation. Experiment results show that our method can reasonably segment long sentences, and improve the quality of machine translation.
CITATION STYLE
Yang, S., Feng, C., & Huang, H. (2015). A hybrid sentence splitting method by comma insertion for machine translation with CRF. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9427, pp. 141–152). Springer Verlag. https://doi.org/10.1007/978-3-319-25816-4_12
Mendeley helps you to discover research relevant for your work.