In this chapter, we describe various notions and methods of page segmentation, which is to segment page images into homogeneous components such as text blocks, figures, and tables. It constitutes the whole process called layout analysis along with the classification of segmented components described in Chap. 7 (Page Similarity and Classification). This chapter starts with classification of page layout structures from various viewpoints including different levels of components and printing colors. Then we classify methods to handle each class of layout. This is done based on three viewpoints: (1) objects to be analyzed, foreground or background; (2) primitives of analysis, pixels, connected components, maximal empty rectangles, etc.; (3) strategy of analysis, top-down and bottom-up. The details of classified methods are described and compared with one another to know pros and cons of these methods.
CITATION STYLE
Kise, K. (2014). Page segmentation techniques in document analysis. In Handbook of Document Image Processing and Recognition (pp. 135–175). Springer London. https://doi.org/10.1007/978-0-85729-859-1_5
Mendeley helps you to discover research relevant for your work.