Efficient computation of statistical significance of query results in databases

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Queries such as database similarity searches return results satisfying certain properties of distances or scores. For domain scientists, the absolute values of scores are seldom sufficient. Statistical significance or p-value of the result is a more useful criterion. This can be computed using an appropriate model of random objects. The problem of computing p-values becomes more acute when queries have multiple components. In this case, the returned score is an aggregate of individual scores. The simple way of calculating the p-value by enumerating all random possibilities fails for large database and query sizes. We propose an efficient method to calculate the approximate p-value of a multi-attribute result when the distribution of scores for the database objects is non-parametric. Experimental evaluation on large databases shows that our method is practical, runs 5 orders of magnitude faster than the basic approach, and has an error of less than 5% in p-value computation. © 2008 Springer-Verlag.

Cite

CITATION STYLE

APA

Singh, V., Bhattacharya, A., & Singh, A. K. (2008). Efficient computation of statistical significance of query results in databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5069 LNCS, pp. 509–516). https://doi.org/10.1007/978-3-540-69497-7_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free