Abstract
A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new\rstatistical test assesses the significance of a group of symbols when found in several genesets of a given database. To each pair of\rsymbols, a p-value depending on the frequency of the two symbols and on the number of joint occurrences, is associated. All\rpairs with p-values below a certain threshold define a graph structure on the set of symbols. The cliques of that graph are\rsignificant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database,\rand is illustrated on the MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections\rcorrespond to already known interactions. On more specific selections of C2, many previously unknown symbol associations\rhave been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for\rbiological evidence.\r
Cite
CITATION STYLE
Ycart, B. (2014). Statistical Data Mining for Symbol Associations in Genomic Databases. International Journal of Genetics and Genomics, 2(6), 97. https://doi.org/10.11648/j.ijgg.20140206.11
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.