Evaluating deep generative models on cognitive tasks: a case study

Zhisheng Tang; Mayank Kejriwal

Journal ArticleOPEN ACCESS

Evaluating deep generative models on cognitive tasks: a case study

Discover Artificial Intelligence (2023) 3(1)

DOI: 10.1007/s44163-023-00067-3

0Citations

16Readers

Abstract

We present a detailed case study evaluating selective cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect, even though the model seems to have a clear understanding of the objects mentioned in the prompt. Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts. ChatGPT’s outputs on such problems generally tended to be unpredictable: even as it made irrational decisions (or employed an incorrect reasoning process) for some simpler decision-making problems, it was able to draw correct conclusions for more complex bet structures. We briefly comment on the nuances and challenges involved in scaling up such a ‘cognitive’ evaluation or conducting it with a closed set of answer keys (‘ground truth’), given that these models are inherently generative and open-ended in responding to prompts.

Cite

CITATION STYLE

APA

Tang, Z., & Kejriwal, M. (2023). Evaluating deep generative models on cognitive tasks: a case study. Discover Artificial Intelligence, 3(1). https://doi.org/10.1007/s44163-023-00067-3

Evaluating deep generative models on cognitive tasks: a case study

Abstract

Cite

Register to see more suggestions