Towards a Benchmark for Scientific Understanding in Humans and Machines

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Scientific understanding is a fundamental goal of science. However, there is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of scientific understanding. In this paper, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral conception of understanding, according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion of scientific understanding by considering a set of questions that gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be different under different circumstances. We suggest building a Scientific Understanding Benchmark (SUB), formed by a set of these tests, allowing for the evaluation and comparison of scientific understanding. Benchmarking plays a crucial role in establishing trust, ensuring quality control, and providing a basis for performance evaluation. By aligning machine and human scientific understanding we can improve their utility, ultimately advancing scientific understanding and helping to discover new insights within machines.

References Powered by Scopus

The extended mind

4113Citations
N/AReaders
Get full text

Minds, brains, and programs

3520Citations
N/AReaders
Get full text

On the dangers of stochastic parrots: Can language models be too big?

2943Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Beyond transparency and explainability: on the need for adequate and contextualized user guidelines for LLM use

5Citations
N/AReaders
Get full text

Fortifying Trust: Can Computational Reliabilism Overcome Adversarial Attacks?

0Citations
N/AReaders
Get full text

Technology-driven causal inference: Prospects and challenges

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Barman, K. G., Caron, S., Claassen, T., & de Regt, H. (2024). Towards a Benchmark for Scientific Understanding in Humans and Machines. Minds and Machines, 34(1). https://doi.org/10.1007/s11023-024-09657-1

Readers' Seniority

Tooltip

Professor / Associate Prof. 2

50%

PhD / Post grad / Masters / Doc 1

25%

Researcher 1

25%

Readers' Discipline

Tooltip

Computer Science 3

60%

Biochemistry, Genetics and Molecular Bi... 1

20%

Medicine and Dentistry 1

20%

Save time finding and organizing research with Mendeley

Sign up for free