Transcriptomic analysis is used to capture the molecular state of a cell or sample in many biological and medical applications. In addition to identifying alterations in activity at the level of individual genes, understanding changes in the gene networks that regulate fundamental biological mechanisms is also an important objective of molecular analysis. As a result, databases that describe biological pathways are increasingly relied on to assist with the interpretation of results from large-scale genomics studies. Incorporating information from biological pathways and gene regulatory networks into a genomic data analysis is a popular strategy, and there are many methods that provide this functionality for gene expression data. When developing or comparing such methods, it is important to gain an accurate assessment of their performance, with simulation-based validation studies a popular choice. This necessitates the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This procedure has been released as the 'graphsim' R package (https://github.com/TomKellyGenetics/graphsim) and is compatible with any graph structure that can be described using the 'igraph' package.
CITATION STYLE
Kelly, S., & Black, M. (2020). graphsim: An R package for simulating gene expression data from graph structures of biological pathways. Journal of Open Source Software, 5(51), 2161. https://doi.org/10.21105/joss.02161
Mendeley helps you to discover research relevant for your work.