Abstract
Introduction: Reliable and lightweight conversions of Microsoft Word documents to HTML have long eluded library publishers. We demonstrate how off-the-shelf large language models (LLMs) like ChatGPT offer a lean pathway forward for generating JATS XML, which current platforms are equipped to render into user-friendly HTML publications. Methods: With careful prompting, ChatGPT can turn a plain text typescript into valid JATS. Leveraging a one-and few-shot approach for the part of an XML file ensures that boilerplate data included in example(s) prompts the LLM to populate the correct data in its output. In and parts, zero-shot prompts with only the name and version of our JATS specification produce valid XML in ChatGPT 4.0. Results: One-and few-shot prompting proved effective in directing ChatGPT 3.5 to consistently encode dis-crete, sequential sections of article typescripts. In retesting with ChatGPT 4.0, zero-shot approaches demon-strated that and parts need only the JATS specification name and version to convert typescript into valid XML. The parts still benefit from a one-and few-shot approach. Discussion: The primary bottleneck is token or source size limitations. Content must be broken up into sepa-rate sections for input and the output manually “stitched” together to form a complete XML file. Conclusion: LLMs may offer a solution for publishers without the resources to encode JATS files by other means. As LLMs increase in scale, we expect workflows for encoding research articles in JATS to become even more accurate, with fewer restrictions on capacity.
Author supplied keywords
Cite
CITATION STYLE
Vaughn, M., & Higgins, R. (2025). Leveraging LLMs in Library Publishing: JATS XML Encoding with ChatGPT. Journal of Librarianship and Scholarly Communication, 13(1). https://doi.org/10.31274/jlsc.18048
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.