This paper compares three different word image representations as base for label free sample selection for word spotting in historical handwritten documents. These representations are a temporal pyramid representation based on pixel counts, a graph based representation, and a pyramidal histogram of characters (PHOC) representation predicted by a PHOCNet trained on synthetic data. We show that the PHOC representation can help to reduce the amount of required training samples by up to 69% depending on the dataset, if it is learned iteratively in an active learning like fashion. While this works for larger datasets containing about 1,700 images, for smaller datasets with 100 images, we find that the temporal pyramid and the graph representation perform better.
CITATION STYLE
Westphal, F., Grahn, H., & Lavesson, N. (2020). Representative image selection for data efficient word spotting. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12116 LNCS, pp. 383–397). Springer. https://doi.org/10.1007/978-3-030-57058-3_27
Mendeley helps you to discover research relevant for your work.