Exploring Heterogeneous Data Lake based on Unified Canonical Graphs

Qin Yuan; Ye Yuan; Zhenyu Wen; He Wang; Chen Chen; Guoren Wang

Conference ProceedingsOPEN ACCESS

Exploring Heterogeneous Data Lake based on Unified Canonical Graphs

SIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022) 1834-1838

DOI: 10.1145/3477495.3531759

7Citations

8Readers

Get full text

Abstract

A data lake is a repository for massive raw and heterogeneous data, which includes multiple data models with different data schemas and query interfaces. Keyword search can extract valuable information for users without the knowledge of underlying schemas and query languages. However, conventional keyword searches are restricted to a certain data model and cannot easily adapt to a data lake. In this paper, we study a novel keyword search. To achieve high accuracy and efficiency, we introduce canonical graphs and then integrate semantically related vertices based on vertex representations. A matching entity based keyword search algorithm is presented to find answers across multiple data sources. Finally, extensive experimental study shows the effectiveness and efficiency of our solution.

Author supplied keywords

Cite

CITATION STYLE

APA

Yuan, Q., Yuan, Y., Wen, Z., Wang, H., Chen, C., & Wang, G. (2022). Exploring Heterogeneous Data Lake based on Unified Canonical Graphs. In SIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1834–1838). Association for Computing Machinery, Inc. https://doi.org/10.1145/3477495.3531759

Exploring Heterogeneous Data Lake based on Unified Canonical Graphs

Abstract

Author supplied keywords

Cite

Register to see more suggestions