WebLens: Towards Interactive Large-scale Structured Data Profiling

Rituparna Khan; Michael Gubanov

Conference ProceedingsOPEN ACCESS

WebLens: Towards Interactive Large-scale Structured Data Profiling

International Conference on Information and Knowledge Management, Proceedings (2020) 3425-3428

DOI: 10.1145/3340531.3417443

7Citations

8Readers

Get full text

Abstract

Data profiling is a "set of statistical data analysis activities and processes to determine properties of a given dataset". Historically,most of the data profiling tasks were aimed at data. At scale, when a dataset has millions of tables, their meta-data (i.e. titles, attribute names and types) becomes abundant similar to data instances, and its profiling starts playing a vital role. Here we demonstrate our work on WebLens- an interactive, scalable metadata profiler for large-scale structured data. At its core is a new data structure - Metadata-profile, coupled with Machine/Deep-Learning models trained to construct it. It represents a meta-data summary of a specific real world object collected over millions of data sources. Such profiles significantly simplify access to large-scale structured datasets for both data scientists and end users. Finally, we performed a user study with 20 students and found WebLens trained models significantly outperform 20 people on the task of construction of metadata-profiles for 10 objects from different domains. For demonstration and evaluation we used a large-scale dataset of '15 Million relational English tables from the Web.

Author supplied keywords

Cite

CITATION STYLE

APA

Khan, R., & Gubanov, M. (2020). WebLens: Towards Interactive Large-scale Structured Data Profiling. In International Conference on Information and Knowledge Management, Proceedings (pp. 3425–3428). Association for Computing Machinery. https://doi.org/10.1145/3340531.3417443

WebLens: Towards Interactive Large-scale Structured Data Profiling

Abstract

Author supplied keywords

Cite

Register to see more suggestions