Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models

Hassan El-Hajj; Matteo Valleriani

Conference Proceedings

Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2024) 14366 247-257

DOI: 10.1007/978-3-031-51026-7_22

0Citations

3Readers

Get full text

Abstract

In this paper, we present a pipeline for image extraction from historical documents using foundation models, and evaluate text-image prompts and their effectiveness on humanities datasets of varying levels of complexity. The motivation for this approach stems from the high interest of historians in visual elements printed alongside historical texts on the one hand, and from the relative lack of well-annotated datasets within the humanities when compared to other domains. We propose a sequential approach that relies on GroundDINO and Meta’s Segment-Anything-Model (SAM) to retrieve a significant portion of visual data from historical documents that can then be used for downstream development tasks and dataset creation, as well as evaluate the effect of different linguistic prompts on the resulting detections.

Author supplied keywords

Cite

CITATION STYLE

APA

El-Hajj, H., & Valleriani, M. (2024). Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14366, pp. 247–257). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-51026-7_22

Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models

Abstract

Author supplied keywords

Cite

Register to see more suggestions