Abstract
After several decades of research but limited adoption in practice, graph querying and analytics are finally starting to gain a foothold in the data management landscape. This is driven to a large degree by the increasing desire to model and query the key entities and the interconnections between them explicitly; and the observation that the use of network science, graph algorithms, and graph mining can lead to crucial new insights that are not accessible without reasoning about those interconnections in a holistic and collective manner. In addition to the traditional application domains like social media, Web, biological networks, and RDF knowledge bases, graph data models are a natural fit for new and important application domains like personal data management, provenance, metadata management, machine learning, and others. Even data that is not naturally graph-structured is increasingly viewed from the graph lense, examples being shopping transaction data, healthcare data, source code repositories, parcel shipment data, etc. This increased attention has led to much recent work on specialized graph databases (e.g., Neo4j, Titan, OrientDB, Dgraph, Blazegraph, Amazon Neptune), and graph analysis frameworks (e.g., Giraph, GraphLab, Ligra, GraphX, and numerous others [? ]). Several established relational databases have added support for limited forms of graph querying and analytics including SAP HANA, Oracle, Aster Data, and SQL Server. However, there is no clear unifying or overarching theme that connects these systems and other research on graph querying and analytics, resulting in a highly fragmented landscape.We contend that the key reasons for this are two-fold. First, graph querying/analytics is typically a small part of the overall data management process, and many of the solutions (especially graph databases) require complete buy-in so that the data can be stored in an appropriate graph-aware format and appropriate indexes can be built. Second, the querying or analysis workloads typically considered to be within the scope of graph data management are highly varied, and include point queries (e.g., pattern matching, reachability, shortest paths), network science (e.g., community detection, centrality), graph mining (e.g., influence propagation, similarity-based ranking), graph algorithms (e.g., bipartite matching, min-cut), temporal analytics (e.g., network evolution), "what-if" analytics (e.g., vulnerability analysis), among others. As a result, many users prefer crafting custom solutions to solve specific problems in the context of their environments, that use custom data structures that are not reusable or standardized. In this talk, I will present our vision for an in situ graph querying and analytics framework, called GraphGen, that uses the familiar abstraction of "views" or "data virtualization" to: (a) construct graphs by combining data from a number of heterogeneous data sources, and (b) query and analyze them using powerful, flexible APIs. I will discuss how this framework can serve as a unifying abstraction that covers the spectrum of use cases and approaches above. Our focus in the work done so far has been on the common scenario where the data originally resides in an RDBMS [? ? ? ]; I will describe our prototype that enables users to declaratively specify graph extraction tasks over such data, visually explore the extracted graphs, and write and execute graph algorithms over them, either directly or using existing graph libraries like the NetworkX Python library. GraphGen has a fundamentally different goal from recent work on using RDBMSs to store graph data through "shredding". Instead, GraphGen is intended to analyze "hidden" graphs that are present in existing databases (relational or not). GraphGen attempts to utilize the underlying systems to the full extent possible by pushing down computation, uses a novel condensed representation to handle graphs that may be too large to extract in their entirety, allows writing programs using a general subgraph-centric API, and features several optimizations for efficient extraction and querying of large graphs. I will conclude with a discussion of new optimization opportunities and open challenges.
Author supplied keywords
Cite
CITATION STYLE
Deshpande, A. (2018). In situ graph querying and analytics with GraphGen. In Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences and Systems (GRADES) and Network Data Analytics (NDA), GRADES-NDA 2018. Association for Computing Machinery, Inc. https://doi.org/10.1145/3210259.3210261
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.