Do Large Language Models Show Human-like Biases? Exploring Confidence—Competence Gap in AI

1Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.

Abstract

This study investigates self-assessment tendencies in Large Language Models (LLMs), examining if patterns resemble human cognitive biases like the Dunning–Kruger effect. LLMs, including GPT, BARD, Claude, and LLaMA, are evaluated using confidence scores on reasoning tasks. The models provide self-assessed confidence levels before and after responding to different questions. The results show cases where high confidence does not correlate with correctness, suggesting overconfidence. Conversely, low confidence despite accurate responses indicates potential underestimation. The confidence scores vary across problem categories and difficulties, reducing confidence for complex queries. GPT-4 displays consistent confidence, while LLaMA and Claude demonstrate more variations. Some of these patterns resemble the Dunning–Kruger effect, where incompetence leads to inflated self-evaluations. While not conclusively evident, these observations parallel this phenomenon and provide a foundation to further explore the alignment of competence and confidence in LLMs. As LLMs continue to expand their societal roles, further research into their self-assessment mechanisms is warranted to fully understand their capabilities and limitations.

References Powered by Scopus

Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments

4808Citations
N/AReaders
Get full text

The dunning-kruger effect. On being ignorant of one's own ignorance

794Citations
N/AReaders
Get full text

Large language models show human-like content biases in transmission chain experiments

45Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Digital Diagnostics: The Potential of Large Language Models in Recognizing Symptoms of Common Illnesses

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Singh, A. K., Lamichhane, B., Devkota, S., Dhakal, U., & Dhakal, C. (2024). Do Large Language Models Show Human-like Biases? Exploring Confidence—Competence Gap in AI. Information (Switzerland), 15(2). https://doi.org/10.3390/info15020092

Readers' Seniority

Tooltip

Professor / Associate Prof. 4

29%

PhD / Post grad / Masters / Doc 4

29%

Researcher 4

29%

Lecturer / Post doc 2

14%

Readers' Discipline

Tooltip

Computer Science 4

36%

Business, Management and Accounting 4

36%

Social Sciences 2

18%

Economics, Econometrics and Finance 1

9%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free