In Bioinformatics, it is common to search biological sequences (DNA, RNA, proteins) for functional motifs such as cross-over hotspot instigators (chi), restriction sites, regulation motifs, binding sites, active sites in proteins, etc. (Beaudoing et al., 2000; Brazma et al., 1998; El Karoui et al., 1999; Frith et al., 2002; Hampson et al., 2002; Karlin et al., 1992; Leonardo Marino-Ramirez & Landsman, 2004; van Helden et al., 1998). Due to evolution pressure, functional motifs are likely to be more conserved than non-functional motifs. As a consequence, it is a natural strategy to search biological sequences for motifs which are statistically exceptional (ex: overor under-represented). Given M a motif of interest (from simple strings to complex regular expressions), a recurrent question is: “how surprising is it to observe n occurrences of M in my dataset ”. In statistical terms, this is equivalent to compute the p-value of observation n in respect with a relevant reference model. More precisely, if X1:l = X1 . . .Xl is a length l random sequence generated by our reference model, and if N denotes the random number of occurrences ofM in X1:l, for any n 0 our objective is to compute the significance score of observation n:
CITATION STYLE
Nuel, G. (2011). Significance Score of Motifs in Biological Sequences. In Bioinformatics - Trends and Methodologies. InTech. https://doi.org/10.5772/18448
Mendeley helps you to discover research relevant for your work.