Datasets and annotations for document analysis and recognition

Ernest Valveny

Book Chapter

Datasets and annotations for document analysis and recognition

Valveny E

Springer London, (2014), 983-1009

DOI: 10.1007/978-0-85729-859-1_32

9Citations

5Readers

Get full text

Abstract

The definition of standard frameworks for performance evaluation is a key issue in order to advance the state-of-the-art in any field of document analysis since it permits a fair and objective comparison of different proposed methods under a common scenario. For that reason, a large number of public datasets have emerged in the last years. However, several challenges must be considered when creating such datasets in order to get a sufficiently large collection of representative data that can be easily exploited by the researchers. In this chapter we review different approaches followed by the document analysis community to address some of these challenges, such as the collection of representative data, its annotation with ground-truth information, or the representation using accepted and common formats. We also provide a comprehensive list of existing public datasets for each of the different areas of document analysis.

Author supplied keywords

Cite

CITATION STYLE

APA

Valveny, E. (2014). Datasets and annotations for document analysis and recognition. In Handbook of Document Image Processing and Recognition (pp. 983–1009). Springer London. https://doi.org/10.1007/978-0-85729-859-1_32

Datasets and annotations for document analysis and recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions