A guide to creating design matrices for gene expression experiments

33Citations
Citations of this article
287Readers
Mendeley users who have this article in their library.

Abstract

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.

References Powered by Scopus

edgeR: A Bioconductor package for differential expression analysis of digital gene expression data

28622Citations
N/AReaders
Get full text

Limma powers differential expression analyses for RNA-sequencing and microarray studies

24017Citations
N/AReaders
Get full text

Linear models and empirical bayes methods for assessing differential expression in microarray experiments

9668Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Best practices for single-cell analysis across modalities

298Citations
N/AReaders
Get full text

prolfqua: A Comprehensive R-Package for Proteomics Differential Expression Analysis

15Citations
N/AReaders
Get full text

Interactive and Reproducible Workflows for Exploring and Modeling RNA-seq Data with pcaExplorer, Ideal, and GeneTonic

12Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Law, C. W., Smyth, G. K., Ritchie, M. E., Zeglinski, K., Dong, X., & Alhamdoosh, M. (2020). A guide to creating design matrices for gene expression experiments. F1000Research, 9. https://doi.org/10.12688/f1000research.27893.1

Readers over time

‘20‘21‘22‘23‘24‘250255075100

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 106

66%

Researcher 51

32%

Professor / Associate Prof. 4

2%

Readers' Discipline

Tooltip

Biochemistry, Genetics and Molecular Bi... 79

56%

Agricultural and Biological Sciences 37

26%

Medicine and Dentistry 15

11%

Computer Science 10

7%

Article Metrics

Tooltip
Mentions
News Mentions: 1
Social Media
Shares, Likes & Comments: 2

Save time finding and organizing research with Mendeley

Sign up for free
0