Functional Dependency Generation and Applications in pay-as-you-go data integration systems

  • Wang D
  • Dong L
  • Sarma A
  • et al.
N/ACitations
Citations of this article
30Readers
Mendeley users who have this article in their library.

Abstract

Recently, the opportunity of extracting structured data from the Web has been identified by a number of research projects. One such example is that millions of relational-style HTML tables can be extracted from the Web. Traditional data integration approaches do not scale over such corpora with hundreds of small tables in one domain. To solve this problem, previous work has proposed pay-as-you-go data integration systems to provide, with little up-front cost, base services over loosely-integrated informa- tion. One key component of such systems, which has received little attention to date, is the need for a framework to gauge and improve the quality of the integration. We propose a frame- work based on functional dependencies(FDs). Unlike in tradi- tional database design, where FDs are specified as statements of truth about all possible instances of the database; in web envi- ronment, FDs are not specified over the data tables. Instead, we generate FDs by counting-based algorithms over many data sources, and extend the FDs with probabilities to capture the in- herent uncertainties in them. Given these probabilistic FDs, we show how to solve two problems to improve data and schema qual- ity in a pay-as-you-go system: (1) pinpointing dirty data sources and (2) normalizing large mediated schemas. We describe these techniques and evaluate them over real-world data sets extracted from the Web.

Cite

CITATION STYLE

APA

Wang, D. Z., Dong, L., Sarma, a. D., Franklin, M. J., & Halevy, A. (2009). Functional Dependency Generation and Applications in pay-as-you-go data integration systems. Proceedings of the 12th International Workshop on the Web and Databases (WebDB), ACM, 1–6. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.9353&rep=rep1&type=pdf

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free