CrowdSJ: Skyline-Join Query Processing of Incomplete Datasets with Crowdsourcing

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Skyline query is very useful in decision-making systems, WSN and so on. As a variation of skyline query, skyline-join query can return the results from multiple datasets. However, incomplete datasets are a frequent phenomenon due to the widespread use of automated information extraction and aggregation. Existing methods for dealing with incomplete data, such as probability, data padding can solve the problem, but cannot effectively reflect the real situation and are lack of integrality. Therefore, in this paper, in order to reflect the situation more accuracy and more user-centric, we research the problem of skyline-join query over incomplete datasets with crowdsourcing, named CrowdSJ. The crowdsourcing-based skyline-join query processing problem over incomplete datasets is divided into two situations. One is the skyline-join query only involves the unknown crowdsourcing attribute and the join attribute, named Partial Skyline-Join with Crowdsourcing (PSJCrowd). The other one is the skyline-join query involves all the attributes, named All Skyline-Join with Crowdsourcing (ASJCrowd). For PSJCrowd, first, we filter the known dataset. Then, we present the level-preference-tree-index, and propose the partial skyline-join with crowdsourcing algorithm. For ASJCrowd, first, we filter the known dataset too. Second, we build a level-preference-tree-index based on the known attributes of the incomplete dataset. Third, we propose the skyline-join with crowdsourcing on single dataset algorithm, CrowdSJ-single, to filter the dataset containing unknown attributes. Then, we build a global level-preference-tree-index based on the known attributes of the incomplete dataset and the complete dataset. We propose the skyline-join with crowdsourcing on multiple datasets algorithm, CrowdSJ-multiple. We filter the linked tuples based on the global level-preference-tree-index and the results of each round of crowdsourcing. Numerous experiments on synthetic and real datasets demonstrate that our algorithms are efficient and effective.

Cite

CITATION STYLE

APA

Ding, L., Zhang, X., Zhang, H., Liu, L., & Song, B. (2021). CrowdSJ: Skyline-Join Query Processing of Incomplete Datasets with Crowdsourcing. IEEE Access, 9, 73216–73229. https://doi.org/10.1109/ACCESS.2021.3079324

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free