Abstract
We propose a learning approach for mapping context-dependent sequential instructions to actions. We address the problem of discourse and state dependencies with an attention-based model that considers both the history of the interaction and the state of the world. To train from start and goal states without access to demonstrations, we propose SESTRA, a learning algorithm that takes advantage of single-step reward observations and immediate expected reward maximization. We evaluate on the SCONE domains, and show absolute accuracy improvements of 9.8%-25.3% across the domains over approaches that use high-level logical representations.
Cite
CITATION STYLE
Suhr, A., & Artzi, Y. (2018). Situated mapping of sequential instructions to actions with single-step reward observation. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 1, pp. 2072–2082). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-1193
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.