Abstract
The practical advantage of a data lake depends on the semantic understanding of its data. This knowledge is usually not externalized, but present in the minds of the data analysts who have used a great deal of cognitive effort to understand the semantic relationships of the heterogeneous data sources. The SQL queries they have written contain this hidden knowledge and should therefore serve as the foundation for a self-learning system. This paper proposes a methodology for extracting knowledge fragments from SQL queries and representing them in an RDF-based knowledge graph. The feasibility of this approach is demonstrated by a prototype implementation and evaluated using example data. It is shown that a query-driven knowledge graph is an appropriate tool to approximate the semantics of the data contained in a data lake and to incrementally provide interactive feedback to data analysts to help them with the formulation of queries.
Author supplied keywords
Cite
CITATION STYLE
Haller, D., & Lenz, R. (2020). Pharos: Query-Driven Schema Inference for the Semantic Web. In Communications in Computer and Information Science (Vol. 1168 CCIS, pp. 112–124). Springer. https://doi.org/10.1007/978-3-030-43887-6_10
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.