The RDF pipeline framework: Automating distributed, dependency-driven data pipelines

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Semantic web technology is well suited for large-scale information integration problems such as those in healthcare involving multiple diverse data sources and sinks, each with its own data format, vocabulary and information requirements. The resulting data production processes often require a number of steps that must be repeated when source data changes - often wastefully if only certain portions of the data changed. This paper explains how distributed healthcare data production processes can be conveniently defined in RDF as executable dependency graphs, using the RDF Pipeline Framework. Nodes in the graph can perform arbitrary processing and are cached automatically, thus avoiding unnecessary data regeneration. The framework is loosely coupled, using native protocols for efficient node-to-node communication when possible, while falling back to RESTful HTTP when necessary. It is data and programming language agnostic, using framework-supplied wrappers to allow pipeline developers to use their favorite languages and tools for node-specific processing. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Booth, D. (2013). The RDF pipeline framework: Automating distributed, dependency-driven data pipelines. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7970 LNBI, pp. 54–68). https://doi.org/10.1007/978-3-642-39437-9_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free