Constructing a recipe web from historical newspapers

Marieke van Erp; Melvin Wevers; Hugo Huurdeman

Conference ProceedingsOPEN ACCESS

Constructing a recipe web from historical newspapers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11136 LNCS 217-232

DOI: 10.1007/978-3-030-00671-6_13

5Citations

16Readers

Abstract

Historical newspapers provide a lens on customs and habits of the past. For example, recipes published in newspapers highlight what and how we ate and thought about food. The challenge here is that newspaper data is often unstructured and highly varied. Digitised historical newspapers add an additional challenge, namely that of fluctuations in OCR quality. Therefore, it is difficult to locate and extract recipes from them. We present our approach based on distant supervision and automatically extracted lexicons to identify recipes in digitised historical newspapers, to generate recipe tags, and to extract ingredient information. We provide OCR quality indicators and their impact on the extraction process. We enrich the recipes with links to information on the ingredients. Our research shows how natural language processing, machine learning, and semantic web can be combined to construct a rich dataset from heterogeneous newspapers for the historical analysis of food culture.

Author supplied keywords

Cite

CITATION STYLE

APA

van Erp, M., Wevers, M., & Huurdeman, H. (2018). Constructing a recipe web from historical newspapers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11136 LNCS, pp. 217–232). Springer Verlag. https://doi.org/10.1007/978-3-030-00671-6_13

Constructing a recipe web from historical newspapers

Abstract

Author supplied keywords

Cite

Register to see more suggestions