Clustering document images using graph summaries

3Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Document image classification is an important step in document image analysis. Based on classification results we can tackle other tasks such as indexation, understanding or navigation in document collections. Using a document representation and an unsupervized classification method, we can group documents that from the user point of view constitute valid clusters. The semantic gap between a domain independent document representation and the user implicit representation can lead to unsatisfactory results. In this paper we describe document images based on frequent occurring symbols. This document description is created in an unsupervised manner and can be related to the domain knowledge. Using data mining techniques applied to a graph based document representation we found frequent and maximal subgraphs. For each document image, we construct a bag containing the frequent subgraphs found in it. This bag of "symbols" represents the description of a document. We present results obtained on a corpus of graphical document images. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Barbu, E., Héroux, P., Adam, S., & Trupin, E. (2005). Clustering document images using graph summaries. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3587 LNAI, pp. 194–202). Springer Verlag. https://doi.org/10.1007/11510888_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free