When interpreting questions in a virtual patient dialogue system, one must inevitably tackle the challenge of a long tail of relatively infrequently asked questions. To make progress on this challenge, we investigate the use of paraphrasing for data augmentation and neural memory-based classification, finding that the two methods work best in combination. In particular, we find that the neural memory-based approach not only outperforms a straight CNN classifier on low frequency questions, but also takes better advantage of the augmented data created by paraphrasing, together yielding a nearly 10% absolute improvement in accuracy on the least frequently asked questions.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Jin, L., King, D., Hussein, A., White, M., & Danforth, D. (2018). Using paraphrasing and memory-augmented models to combat data sparsity in question interpretation with a virtual patient dialogue system. In Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2018 at the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HTL 2018 (pp. 13–23). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-0502