Summarizing large-scale database schema using community detection

  • Wang X
  • Zhou X
  • Wang S
  • 22

    Readers

    Mendeley users who have this article in their library.
  • 8

    Citations

    Citations of this article.

Abstract

Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.

Author-supplied keywords

  • Community detection
  • Large scale
  • Schema
  • Summarization

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Xue Wang

  • Xuan Zhou

  • Shan Wang

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free