Various criteria have been defined to evaluate the significance of sets of words, the computation of them often being difficult. We provide explicit expressions for the waiting time in such a context. In order to assess the significance of a cluster of potential binding sites, we extend them to the co-occurrence problem. We point out that these criteria values depend on a few fundamental parameters. We provide efficient algorithms to compute them, that rely on a combinatorial interpretation of the formulae. We show that our results are very tight in the so-called twilight zone and improve on previous rough approximations. One assumes that the text is generated according to a Markov stationary process. These results are developed for an extended model of consensus. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Boeva, V., Clément, J., Régnier, M., & Vandenbogaert, M. (2005). Assessing the significance of sets of words. In Lecture Notes in Computer Science (Vol. 3537, pp. 358–370). Springer Verlag. https://doi.org/10.1007/11496656_31
Mendeley helps you to discover research relevant for your work.