Online data fusion

53Citations
Citations of this article
84Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The Web contains a significant volume of structured data in various domains, but a lot of data are dirty and erroneous, and they can be propagated through copying. While data integration techniques allow querying structured data on theWeb, they take the union of the answers retrieved from different sources and can thus return conflicting information. Data fusion techniques, on the other hand, aim to find the true values, but are designed for offline data aggregation and can take a long time. This paper proposes SOLARIS, the first online data fusion system. It starts with returning answers from the first probed source, and refreshes the answers as it probes more sources and applies fusion techniques on the retrieved data. For each returned answer, it shows the likelihood that the answer is correct, and stops retrieving data for it after gaining enough confidence that data from the unprocessed sources are unlikely to change the answer. We address key problems in building such a system and show empirically that the system can start returning correct answers quickly and terminate fast without sacrificing the quality of the answers. © 2011 VLDB Endowment.

Cite

CITATION STYLE

APA

Liu, X., Dong, X. L., Ooi, B. C., & Srivastava, D. (2011). Online data fusion. In Proceedings of the VLDB Endowment (Vol. 4, pp. 932–943). VLDB Endowment. https://doi.org/10.14778/3402707.3402731

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free