The current hype about machine learning has spurred a new wave of hope and enthusiasm amongst records managers and archivists that they can rely on algorithms to reduce the amount of manual intervention in the management and appraisal of large volumes of non-structured content. Commercial software providers promote out-of-the-box tools for auto-classification, seamlessly integrated into attractive dashboards, but is the integration of machine learning within an information management context such a happy marriage as it seems? This chapter seeks to provide a pragmatic overview both of the possibilities and the limits of automation from an archival and records management perspective. Following an overview of the different types of contexts in which automation can be applied, the chapter focusses in particular on topic modelling (TM). This low-barrier method of automatically extracting keywords from large volumes of non-structured text is presented with the help of a case study in which TM is applied to digitised archival holdings of the European Commission (EC). The paper concludes that, as in real life, making a successful marriage is hard work and requires an ongoing effort.
CITATION STYLE
Coeckelbergs, M., & Van Hooland, S. (2021). Machine learning techniques for the management of digitised collections. In Information and Knowledge Organisation in Digital Humanities: Global Perspectives (pp. 244–259). Taylor and Francis. https://doi.org/10.4324/9781003131816-12
Mendeley helps you to discover research relevant for your work.