SOAP processing: A non-extractive approach

Jimmy Zhang

Journal Article

SOAP processing: A non-extractive approach

Zhang J

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3250 152-167

DOI: 10.1007/978-3-540-30209-4_12

1Citations

3Readers

Get full text

Abstract

As the first step of most XML processing algorithms, one usually extracts token content out of the source document into many discrete string objects. We propose a "non-extractive" tokenization approach that maintains the source document intact in memory. Using a binary encoding specification called Virtual Token Descriptor (VTD), the processing model represents tokens exclusively using starting offset and length. To create a hierarchical view of the data encapsulated in the SOAP message, the parser further indexes elements of same depths using directory-like structures we call location cache. Through a demonstration of navigating the document hierarchy using VTD and location caches, we show that it is indeed possible to create a cursor-based API that retains most of DOM's random-access capabilities at a fraction of its memory usage. Furthermore, by analyzing key design constraints of custom hardware, we reason that the memory conserving characteristics of the processing model simultaneously make possible "SOAP on a chip" and "binary- enhanced SOAP." The benchmark results show that the reference implementation of our processing model significantly outperforms Xerces DOM in terms of both memory and processing performance. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Zhang, J. (2004). SOAP processing: A non-extractive approach. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3250, 152–167. https://doi.org/10.1007/978-3-540-30209-4_12

SOAP processing: A non-extractive approach

Abstract

Cite

Register to see more suggestions