Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model

16Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.

Cite

CITATION STYLE

APA

Rastas, I., Ryan, Y., Tiihonen, I., Qaraei, M., Repo, L., Babbar, R., … Ginter, F. (2022). Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model. In LChange 2022 - 3rd International Workshop on Computational Approaches to Historical Language Change 2022, Proceedings of the Workshop (pp. 68–77). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.lchange-1.7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free