Efficient computation of statistical significance of query results in databases

Vishwakarma Singh; Arnab Bhattacharya; Ambuj K. Singh

Conference Proceedings

Efficient computation of statistical significance of query results in databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5069 LNCS 509-516

DOI: 10.1007/978-3-540-69497-7_32

0Citations

1Readers

Get full text

Abstract

Queries such as database similarity searches return results satisfying certain properties of distances or scores. For domain scientists, the absolute values of scores are seldom sufficient. Statistical significance or p-value of the result is a more useful criterion. This can be computed using an appropriate model of random objects. The problem of computing p-values becomes more acute when queries have multiple components. In this case, the returned score is an aggregate of individual scores. The simple way of calculating the p-value by enumerating all random possibilities fails for large database and query sizes. We propose an efficient method to calculate the approximate p-value of a multi-attribute result when the distribution of scores for the database objects is non-parametric. Experimental evaluation on large databases shows that our method is practical, runs 5 orders of magnitude faster than the basic approach, and has an error of less than 5% in p-value computation. © 2008 Springer-Verlag.

Cite

CITATION STYLE

APA

Singh, V., Bhattacharya, A., & Singh, A. K. (2008). Efficient computation of statistical significance of query results in databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5069 LNCS, pp. 509–516). https://doi.org/10.1007/978-3-540-69497-7_32

Efficient computation of statistical significance of query results in databases

Abstract

Cite

Register to see more suggestions