Punctuation prediction and disfluency prediction can improve downstream natural language processing tasks such as machine translation and information extraction. Combining the two tasks can potentially improve the efficiency of the overall pipeline system and reduce error propagation. In this work1, we compare various methods for combining punctuation prediction (PU) and disfluency prediction (DF) on the Switchboard corpus. We compare an isolated prediction approach with a cascade approach, a rescoring approach, and three joint model approaches. For the cascade approach, we show that the soft cascade method is better than the hard cascade method. We also use the cascade models to generate an n-best list, use the bi-directional cascade models to perform rescoring, and compare that with the results of the cascade models. For the joint model approach, we compare mixedlabel Linear-chain Conditional Random Field (LCRF), cross-product LCRF and 2- layer Factorial Conditional Random Field (FCRF) with soft-cascade LCRF. Our results show that the various methods linking the two tasks are not significantly different from one another, although they perform better than the isolated prediction method by 0.5-1.5% in the F1 score. Moreover, the clique order of features also shows a marked difference.
CITATION STYLE
Wang, X., Sim, K. C., & Ng, H. T. (2014). Combining punctuation and disfluency prediction: An empirical study. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 121–130). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1013
Mendeley helps you to discover research relevant for your work.