Abstract
Efficient computation of n-gram posterior probabilities from lattices has applications in lattice-based minimum Bayes-risk decoding in statistical machine translation and the estimation of expected document frequencies from spoken corpora. In this paper, we present an algorithm for computing the posterior probabilities of all ngrams in a lattice and constructing a minimal deterministic weighted finite-state automaton associating each n-gram with its posterior for efficient storage and retrieval. Our algorithm builds upon the best known algorithm in literature for computing ngram posteriors from lattices and leverages the following observations to significantly improve the time and space requirements: i) the n-grams for which the posteriors will be computed typically comprises all n-grams in the lattice up to a certain length, ii) posterior is equivalent to expected count for an n-gram that do not repeat on any path, iii) there are efficient algorithms for computing n-gram expected counts from lattices. We present experimental results comparing our algorithm with the best known algorithm in literature as well as a baseline algorithm based on weighted finite-state automata operations.
Cite
CITATION STYLE
Can, D., & Narayanan, S. S. (2015). A dynamic programming algorithm for computing n-gram posteriors from lattices. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 2388–2397). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1286
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.