As the first step of most XML processing algorithms, one usually extracts token content out of the source document into many discrete string objects. We propose a "non-extractive" tokenization approach that maintains the source document intact in memory. Using a binary encoding specification called Virtual Token Descriptor (VTD), the processing model represents tokens exclusively using starting offset and length. To create a hierarchical view of the data encapsulated in the SOAP message, the parser further indexes elements of same depths using directory-like structures we call location cache. Through a demonstration of navigating the document hierarchy using VTD and location caches, we show that it is indeed possible to create a cursor-based API that retains most of DOM's random-access capabilities at a fraction of its memory usage. Furthermore, by analyzing key design constraints of custom hardware, we reason that the memory conserving characteristics of the processing model simultaneously make possible "SOAP on a chip" and "binary- enhanced SOAP." The benchmark results show that the reference implementation of our processing model significantly outperforms Xerces DOM in terms of both memory and processing performance. © Springer-Verlag 2004.
CITATION STYLE
Zhang, J. (2004). SOAP processing: A non-extractive approach. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3250, 152–167. https://doi.org/10.1007/978-3-540-30209-4_12
Mendeley helps you to discover research relevant for your work.