Document classification using layout analysis

  • Jianying Hu
  • Kashi R
  • Wilfong G
N/ACitations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendible

Cite

CITATION STYLE

APA

Jianying Hu, Kashi, R., & Wilfong, G. (2008). Document classification using layout analysis (pp. 556–560). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/dexa.1999.795245

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free