BYTESIZED32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

Ruoyao Wang; Graham Todd; Xingdi Yuan; Ziang Xiao; Marc Alexandre Côté; Peter Jansen

Conference Proceedings

BYTESIZED32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 13455-13471

DOI: 10.18653/v1/2023.emnlp-main.830

2Citations

13Readers

Get full text

Abstract

In this work we investigate the capacity of language models to generate explicit, inter pretable, and interactive world models of sci entific and common-sense reasoning tasks. We operationalize this as a task of generating text games, expressed as hundreds of lines of PYTHON code. To facilitate this task, we introduce BYTESIZED321, a corpus of 32 reasoning-focused text games totalling 20k lines of PYTHON code. We empirically demon strate that GPT-4 can use these games as tem plates for single-shot in-context learning, suc cessfully producing runnable games on unseen topics in 28% of cases. When allowed to self reflect on program errors, game runnability substantially increases to 57%. While evalu ating simulation fidelity is labor intensive, we introduce a suite of automated metrics to assess game fidelity, technical validity, adherence to task specifications, and winnability, showing a high-degree of agreement with expert human ratings. We pose this as a challenge task to spur further development at the juncture of world modeling and code generation.

Cite

CITATION STYLE

APA

Wang, R., Todd, G., Yuan, X., Xiao, Z., Côté, M. A., & Jansen, P. (2023). BYTESIZED32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 13455–13471). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.830

BYTESIZED32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

Abstract

Cite

Register to see more suggestions