Abstract
Understanding subcellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high-resolution mapping of thou-sands of proteins to subcellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus neces-sary. We approach analysis of spatial proteomics data in a nonparametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a subcellular niche, with each mixture component cap-turing the distinct correlation structure observed within each niche. The avail-ability of marker proteins (i.e., proteins with a priori known labelled loca-tions) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covari-ance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate com-putation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.
Author supplied keywords
Cite
CITATION STYLE
Crook, O. M., Lilley, K. S., Gatto, L., & Kirk, P. D. W. (2022). SEMI-SUPERVISED NONPARAMETRIC BAYESIAN MODELLING OF SPATIAL PROTEOMICS. Annals of Applied Statistics, 16(4), 2554–2576. https://doi.org/10.1214/22-AOAS1603
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.