A Systems Approach to Validating Large Language Model Information Extraction: The Learnability Framework Applied to Historical Legal Texts

Ali Çetinkaya

Journal ArticleOPEN ACCESS

A Systems Approach to Validating Large Language Model Information Extraction: The Learnability Framework Applied to Historical Legal Texts

Çetinkaya A

Information (Switzerland) (2025) 16(11)

DOI: 10.3390/info16110960

1Citations

25Readers

Abstract

This paper introduces a learnability framework for validating large language model (LLM) information extraction without ground-truth annotations. Applied to 20,809 Ottoman legal texts, the framework achieves a Learnability Score of 0.891 through multi-classifier consensus, with external validation confirming substantial agreement across five diverse LLMs ( (Formula presented.) = 0.785) and human experts ( (Formula presented.) = 0.786). The approach treats internal consistency as a measurable systemic property, where heterogeneous machine learning models independently rediscover LLM-assigned patterns. Confusion analysis reveals errors concentrate at jurisprudentially meaningful boundaries (e.g., commercial-inheritance: 20.4% of disagreements), demonstrating semantic coherence rather than arbitrary noise. The framework offers practical validation for historical and specialized corpora where traditional annotation is infeasible, processing documents at USD 0.01 each with parallelizable throughput. Validated annotations enable knowledge graph construction with 20,809 document nodes, 7 category nodes, and confusion-weighted semantic proximity edges. This systems-based methodology advances reproducible computational research in domains lacking established benchmarks.

Author supplied keywords

Cite

CITATION STYLE

APA

Çetinkaya, A. (2025). A Systems Approach to Validating Large Language Model Information Extraction: The Learnability Framework Applied to Historical Legal Texts. Information (Switzerland), 16(11). https://doi.org/10.3390/info16110960

A Systems Approach to Validating Large Language Model Information Extraction: The Learnability Framework Applied to Historical Legal Texts

Abstract

Author supplied keywords

Cite

Register to see more suggestions