Data catalogs represent a promising solution for semantically classifying and organizing data sources and enriching raw data with metadata. However, recent research has shown that data catalogs are difficult to implement due to the complexity of the data landscape or issues with data governance. Moreover, data catalogs struggle to enable business analysts to find the data they need for their use cases. Against this backdrop, we develop a self-service system that automatically extracts metadata from a data lake and enables business analysts to explore the metadata through an easy-to-use interface. Specifically, instead of implementing the data catalog top-down, our system derives metadata from user queries bottom-up. Hereby, we conduct 15 interviews with business analysts to derive the underlying requirements of the system and evaluate its features with a focus group. Our findings illustrate that participants especially value the possibility to reuse queries from other users and appreciated the support in query validation as data preparation is a complex and time-consuming endeavour.
CITATION STYLE
Gunklach, J., Michalczyk, S., Nadj, M., & Maedche, A. (2023). Metadata Extraction from User Queries for Self-Service Data Lake Exploration. Datenbank-Spektrum, 23(2), 97–105. https://doi.org/10.1007/s13222-023-00448-z
Mendeley helps you to discover research relevant for your work.