Abstract
Automatic text classification is increasingly being used to explore and analyze the rapidly growing collections of digitized cultural heritage archives. Numapresse, a digital humanities project devoted to the historical study of French-speaking newspapers from 1800, has trained models to recognize major newspaper genres from political news to sports section or serial novels. The key output of this program has been the automated classification of all the major French dailies of the Interwar period, from 1920 to 1939, thanks to the comprehensive digitization of the French national library for this period. The first part of this paper presents a modeling strategy grounded in the perspective of cultural history and literary analysis exemplified by the building of historical- based models (spanning 20 years) and the reconceptualization of classification probabilities into potential tools to study intertextual discourses and genre hybridization. The second part showcases an exploration of the output data generated by the model through a method of zoom reading. The varying extent of newspaper genres produces regular patterns at different time scale such as weekly cycles based on thematic supplements, year cycles conditioned by large-scale cultural and social practices and decades trends displaying a longterm history of genre. Our conclusion stresses the promises of model transferability to build a new ecosystem of model reuse for research communities and libraries involved with large collection of cultural heritage archives.
Author supplied keywords
Cite
CITATION STYLE
Langlais, P. C. (2022). Classified News: Revisiting the history of newspaper genre with supervised models. In Digitised Newspapers - A New Eldorado for Historians?: Reflections on Tools, Methods and Epistemology (pp. 195–226). De Gruyter. https://doi.org/10.1515/9783110729214-010
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.