Abstract
Data profiling is a "set of statistical data analysis activities and processes to determine properties of a given dataset". Historically,most of the data profiling tasks were aimed at data. At scale, when a dataset has millions of tables, their meta-data (i.e. titles, attribute names and types) becomes abundant similar to data instances, and its profiling starts playing a vital role. Here we demonstrate our work on WebLens- an interactive, scalable metadata profiler for large-scale structured data. At its core is a new data structure - Metadata-profile, coupled with Machine/Deep-Learning models trained to construct it. It represents a meta-data summary of a specific real world object collected over millions of data sources. Such profiles significantly simplify access to large-scale structured datasets for both data scientists and end users. Finally, we performed a user study with 20 students and found WebLens trained models significantly outperform 20 people on the task of construction of metadata-profiles for 10 objects from different domains. For demonstration and evaluation we used a large-scale dataset of '15 Million relational English tables from the Web.
Author supplied keywords
Cite
CITATION STYLE
Khan, R., & Gubanov, M. (2020). WebLens: Towards Interactive Large-scale Structured Data Profiling. In International Conference on Information and Knowledge Management, Proceedings (pp. 3425–3428). Association for Computing Machinery. https://doi.org/10.1145/3340531.3417443
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.