A central objective in biology is to identify and characterize the mechanistic underpinnings (e.g., gene, protein interactions) of a biological phenomenon (e.g., a phenotype). Today, it is technologically feasible and commonplace to measure a great number of biomolecular features in a biological system at once, and to systematically investigate relationships between the former and the latter phenotype or phenomenological feature of interest across multiple spatial and temporal scales. The canonical starting point for such an investigation is typically a real number valued data matrix of N genomic features × M sample features, where N and M are integers, and N is often orders of magnitude greater than M. In this chapter we describe and rationalize the broad concepts and general principles underlying the analytic steps that start from this data matrix and lead to the identification of coherent mathematical patterns in the data that represent potential and testable mechanistic associations. A key challenge in this analysis is how one deals with false positives that largely arise from the high dimensionality of the data. False positives are mathematical patterns that are not coherent (from a technical or statistical standpoint) or coherent patterns that do not correspond to a true mechanistic association (from a biological standpoint).
CITATION STYLE
Kho, A. T., & Liu, H. (2014). Information processing at the genomics level. In Springer Handbook of Bio-/Neuroinformatics (pp. 43–55). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-30574-0_4
Mendeley helps you to discover research relevant for your work.