A Systems Approach to Validating Large Language Model Information Extraction: The Learnability Framework Applied to Historical Legal Texts

1Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

This paper introduces a learnability framework for validating large language model (LLM) information extraction without ground-truth annotations. Applied to 20,809 Ottoman legal texts, the framework achieves a Learnability Score of 0.891 through multi-classifier consensus, with external validation confirming substantial agreement across five diverse LLMs ( (Formula presented.) = 0.785) and human experts ( (Formula presented.) = 0.786). The approach treats internal consistency as a measurable systemic property, where heterogeneous machine learning models independently rediscover LLM-assigned patterns. Confusion analysis reveals errors concentrate at jurisprudentially meaningful boundaries (e.g., commercial-inheritance: 20.4% of disagreements), demonstrating semantic coherence rather than arbitrary noise. The framework offers practical validation for historical and specialized corpora where traditional annotation is infeasible, processing documents at USD 0.01 each with parallelizable throughput. Validated annotations enable knowledge graph construction with 20,809 document nodes, 7 category nodes, and confusion-weighted semantic proximity edges. This systems-based methodology advances reproducible computational research in domains lacking established benchmarks.

Cite

CITATION STYLE

APA

Çetinkaya, A. (2025). A Systems Approach to Validating Large Language Model Information Extraction: The Learnability Framework Applied to Historical Legal Texts. Information (Switzerland), 16(11). https://doi.org/10.3390/info16110960

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free