Mapping natural language instructions to mobile UI action sequences

57Citations
Citations of this article
145Readers
Mendeley users who have this article in their library.

Abstract

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PIXELHELP, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in HowTo instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PIXELHELP.

Cite

CITATION STYLE

APA

Li, Y., He, J., Zhou, X., Zhang, Y., & Baldridge, J. (2020). Mapping natural language instructions to mobile UI action sequences. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 8198–8210). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.729

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free