BYTESIZED32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

2Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this work we investigate the capacity of language models to generate explicit, inter pretable, and interactive world models of sci entific and common-sense reasoning tasks. We operationalize this as a task of generating text games, expressed as hundreds of lines of PYTHON code. To facilitate this task, we introduce BYTESIZED321, a corpus of 32 reasoning-focused text games totalling 20k lines of PYTHON code. We empirically demon strate that GPT-4 can use these games as tem plates for single-shot in-context learning, suc cessfully producing runnable games on unseen topics in 28% of cases. When allowed to self reflect on program errors, game runnability substantially increases to 57%. While evalu ating simulation fidelity is labor intensive, we introduce a suite of automated metrics to assess game fidelity, technical validity, adherence to task specifications, and winnability, showing a high-degree of agreement with expert human ratings. We pose this as a challenge task to spur further development at the juncture of world modeling and code generation.

Cite

CITATION STYLE

APA

Wang, R., Todd, G., Yuan, X., Xiao, Z., Côté, M. A., & Jansen, P. (2023). BYTESIZED32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 13455–13471). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.830

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free