In law, lore, and everyday life, loopholes are commonplace. When people exploit a loophole, they understand the intended meaning or goal of another person, but choose to go with a different interpretation. Past and current AI research has shown that artificial intelligence engages in what seems superficially like the exploitation of loopholes, but this is likely an-thropomorphization. It remains unclear to what extent current models, especially Large Language Models (LLMs), capture the pragmatic understanding required for engaging in loopholes. We examined the performance of LLMs on two metrics developed for studying loophole behavior in humans: evaluation (ratings of trouble, upset, and humor), and generation (coming up with new loopholes in a given context). We conducted a fine-grained comparison of state-of-the-art LLMs to humans, and find that while many of the models rate loophole behaviors as resulting in less trouble and upset than outright non-compliance (in line with humans), they struggle to recognize the humor in the creative exploitation of loopholes in the way that humans do. Furthermore, only two of the models, GPT-3.5 and 3, are capable of reliably generating loopholes of their own, with GPT-3.5 performing closest to the human baseline.
CITATION STYLE
Murthy, S. K., Parece, K., Bridgers, S., Qian, P., & Ullman, T. (2023). Comparing the Evaluation and Production of Loophole Behavior in Humans and Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4010–4025). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.264
Mendeley helps you to discover research relevant for your work.