Slicing and dicing a newspaper corpus for historical ecology research

Marieke van Erp; Jesse de Does; Katrien Depuydt; Rob Lenders; Thomas van Goethem

Conference Proceedings

Slicing and dicing a newspaper corpus for historical ecology research

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11313 470-484

DOI: 10.1007/978-3-030-03667-6_30

3Citations

4Readers

Get full text

Abstract

Historical newspapers are a novel source of information for historical ecologists to study the interactions between humans and animals through time and space. Newspaper archives are particularly interesting to analyse because of their breadth and depth. However, the size and the occasional noisiness of such archives also brings difficulties, as manual analysis is impossible. In this paper, we present experiments and results on automatic query expansion and categorisation for the perception of animal species between 1800 and 1940. For query expansion and to the manual annotation process, we used lexicons. For the categorisation we trained a Support Vector Machine model. Our results indicate that we can distinguish newspaper articles that are about animal species from those that are not with an F1 of 0.92 and the subcategorisation of the different types of newspapers on animals up to 0.84 F1.

Author supplied keywords

Cite

CITATION STYLE

APA

van Erp, M., de Does, J., Depuydt, K., Lenders, R., & van Goethem, T. (2018). Slicing and dicing a newspaper corpus for historical ecology research. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11313, pp. 470–484). Springer Verlag. https://doi.org/10.1007/978-3-030-03667-6_30

Slicing and dicing a newspaper corpus for historical ecology research

Abstract

Author supplied keywords

Cite

Register to see more suggestions