Abstract
We present a model for locating regions in space based on natural language descriptions. Starting with a 3D scene and a sentence, our model is able to associate words in the sentence with regions in the scene, interpret relations such as on top of or next to, and finally locate the region described in the sentence. All components form a single neural network that is trained end-to-end without prior knowledge of object segmentation. To evaluate our model, we construct and release a new dataset consisting of Minecraft scenes with crowdsourced natural language descriptions. We achieve a 32% relative error reduction compared to a strong neural baseline.
Cite
CITATION STYLE
Kitaev, N., & Klein, D. (2017). Where is misty? Interpreting spatial descriptors by modeling regions in space. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 157–166). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d17-1015
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.