Abstract
This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendible
Cite
CITATION STYLE
Jianying Hu, Kashi, R., & Wilfong, G. (2008). Document classification using layout analysis (pp. 556–560). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/dexa.1999.795245
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.