WebLens: Towards Interactive Large-scale Structured Data Profiling

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data profiling is a "set of statistical data analysis activities and processes to determine properties of a given dataset". Historically,most of the data profiling tasks were aimed at data. At scale, when a dataset has millions of tables, their meta-data (i.e. titles, attribute names and types) becomes abundant similar to data instances, and its profiling starts playing a vital role. Here we demonstrate our work on WebLens- an interactive, scalable metadata profiler for large-scale structured data. At its core is a new data structure - Metadata-profile, coupled with Machine/Deep-Learning models trained to construct it. It represents a meta-data summary of a specific real world object collected over millions of data sources. Such profiles significantly simplify access to large-scale structured datasets for both data scientists and end users. Finally, we performed a user study with 20 students and found WebLens trained models significantly outperform 20 people on the task of construction of metadata-profiles for 10 objects from different domains. For demonstration and evaluation we used a large-scale dataset of '15 Million relational English tables from the Web.

Cite

CITATION STYLE

APA

Khan, R., & Gubanov, M. (2020). WebLens: Towards Interactive Large-scale Structured Data Profiling. In International Conference on Information and Knowledge Management, Proceedings (pp. 3425–3428). Association for Computing Machinery. https://doi.org/10.1145/3340531.3417443

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free