XCDF: A canonical and structured document format

Jean Luc Bloechle; Maurizio Rigamonti; Karim Hadjar; Denis Lalanne; Rolf Ingold

Conference ProceedingsOPEN ACCESS

XCDF: A canonical and structured document format

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 3872 LNCS 141-152

DOI: 10.1007/11669487_13

19Citations

18Readers

Abstract

Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and consistently benefiting from our canonical format in order to access PDF document content and structures. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Bloechle, J. L., Rigamonti, M., Hadjar, K., Lalanne, D., & Ingold, R. (2006). XCDF: A canonical and structured document format. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3872 LNCS, pp. 141–152). https://doi.org/10.1007/11669487_13

XCDF: A canonical and structured document format

Abstract

Cite

Register to see more suggestions