A Web-Based Resource Model for eS...
A Web-Based Resource Model for eScience: Object Reuse & Exchange Carl Lagoze Information Science Cornell University lagoze@cs.cornell.edu Simeon Warner Information Science Cornell University simeon@cs.cornell.edu Herbert Van de Sompel Research Library Los Alamos National Lab herbertv@lanl.gov Robert Sanderson Department of Computer Science University of Liverpool azaroth@liv.ac.uk Michael Nelson Computer Science Old Dominion University mln@cs.odu.edu Pete Johnston Eduserv Foundation pete.johnston@eduserv.org.uk Despite the high ���hype to reality��� ratio, Web 2.0 [37] represents a fundamental change in the Web. Originally conceived as a technology for linking together documents [8], the Web has morphed into a rich network of distributed services, data, and semantic relationships, glued together via social participation [35]. Social applications such as blogs and wikis and social sites such as facebook and Flickr demonstrate how the Web has evolved into a socio���technical network [9], a context where object centered sociality [16] takes place and is subsequently recorded. The power of social participation in Web 2.0 has had substantial impact on cyberinfrastructure applications [15, 34] that until recently leveraged the Web as a mainly technical platform. For example, digital library practitioners, who initially used the Web mainly as a technology for the deployment of traditional concepts (documents, repositories, catalogs, metadata, etc.) [10], now recognize a new information paradigm in which social interaction and the information context that it imprints are as important as collections of artifacts [22, 31]. Similarly, scholarly communication, a phenomenon that has long been described in socio���technical terms [20], is changing in response to Web 2.0. Traditional read���only journal and conference papers, and peer review are being supplemented and even replaced by online mechanisms for contribution, participation, and feedback [11, 18]. Finally, the grids of eScience are being reformulated to include a Web 2.0 inspired ���Architecture of participation that encourages user contribution��� [17]. We argue that the increasing synergy between Web 2.0 and cyberinfrastructure must be reflected in common data models, protocols, and standards. This will allow the artifacts of eScience (data, documents, tools) to be exposed to the broad Web audience via widely deployed mechanisms such as Atom feeds and mashups, which make them accessible outside their original context ��� for example, for teaching and learning [21]. The reverse is also true. It must be easy to include resources from the general web into eScience. Failure to integrate eScience with the mainstream Web will serve to isolate it to the so���called ���invisible web��� [32, 39], where it will be undiscoverable by mainstream search engines. As a result, the goal of ���removing access barriers��� promoted by open access initiatives [1] will not be accomplished. Identifying and Describing Compound Objects Our work in Open Archives Initiative ��� Object Reuse and Exchange (OAI���ORE) ��� focuses on one particular aspect of this shared infrastructure. This is the specification of a data model and a suite of implementation standards to identify and describe compound objects ��� objects that aggregate multiple sources of content including text, images, data, visualization tools, and the like. We argue elsewhere [7, 41���43], as have others in the eScience community [33, 44], that such aggregations are an essential product of eScience. Furthermore, while the notion of an aggregation is not explicit in
the Web Architecture, it is prevalent across general Web space. For example, a ���photo��� in Flickr is an aggregation of multiple renditions in different sizes, and that photo is aggregated along with other ���photos��� into a ���collection���. Similarly, the blog entry that we think of as a singleton is in fact an aggregation composed of the original entry combined with multiple comments (and comments on comments). That blog entry is itself aggregated in a subject partition of a blog. Thus, a suite of standards regarding aggregations will benefit both the eScience and Web community. We note that despite their logical presence on the Web, these aggregations are not recognized in the Web architecture [19], which defines the following notions: Resource, an item of interest URI, a global identifier for a Resource, commonly using the HTTP scheme Representation, a datastream corresponding to the state of a Resource at the time its URI is dereferenced and Link, a directed connection between two Resources, which, when extended with types, forms the foundation of the Semantic Web. The OAI���ORE specifications endow aggregations with two attributes that we consider essential to their utility in the shared eScience/Web context. ��� Identity: Identity is used in the scholarly context for expressing citation, lineage, and rights. As noted above, it is a core concept in the Web architecture for browser access, hyperlinking, and in the semantic web to express assertions, or semantic relationships, between resources. Despite the existence of many identity schemes in both the digital and physical information space (DOIs, Handles, ISBNs, etc.) [38], OAI���ORE specifies resolvable URIs to identify aggregations, thereby establishing an aggregation as a Web Resource that can be accessed and linked to like any URI���identified Resource. ��� Boundary: The ability to deterministically enumerate the constituents of an aggregation is essential for eScience and related application areas. Boundary, for example, informs preservation services ���what to preserve��� and rights management applications ���who is responsible for what���. While not defined in the Web Architecture, the importance of boundary has also been acknowledged in the Web community. It is for example part of the requirement set of the Protocol for Web Description Resources (POWDER) [5] work, which aims to provide mechanisms to publish properties shared by a set of Web resources. OAI���ORE Standards and Specifications Space restrictions prohibit a full description of the ORE specifications, which address these issues. The interested reader is referred to the full set of ORE documents at [25]. We briefly summarize the content of these documents. Data Model - The ORE Data Model [23] makes it possible to associate an identity with aggregations of Web resources and to describe their structure and semantics. It does this by introducing the Resource Map (ReM), which is a resource identified by a URI (say ReM���1) that encapsulates a set of RDF statements [30]. The notion of associating a URI with a set of RDF statements is based on the concept of a named graph developed in the Semantic Web community [12]. The creation of a Resource Map instantiates an aggregation as a resource with a URI distinct from the Resource Map, enumerates the constituents of the aggregation, and defines the relationships among those constituents. Although a Resource Map may assert a variety of RDF statements, it must assert a set of statements that define the aggregation graph. These statement define a sub���graph relating the Resource Map to the Aggregation via the ore:describes predicate, the Aggregation to its constituent Aggregated Resources via the ore:aggregates predicate, and associate the Resource Map with key metadata properties: dcterms:creator and dcterms:modified. Serialization ��� Serialization provides a means of transmitting, introspecting upon, and storing ORE data model���based descriptions of aggregations. Because the ORE data model is based on RDF triples, the RDF/XML [6] expression of these triples is a natural and fully expressive serialization [29]. In addition, to make Resource Maps more widely accessible we define a somewhat less expressive serialization in the popular XML���based Atom syndication format [26, 28, 36].