The DNA-encoded nucleosome organi...
LETTERS The DNA-encoded nucleosome organization of a eukaryotic genome Noam Kaplan1*, Irene K. Moore3*, Yvonne Fondufe-Mittendorf3, Andrea J. Gossett4, Desiree Tillo5, Yair Field1, Emily M. LeProust6, Timothy R. Hughes5,7,8, Jason D. Lieb4, Jonathan Widom3 & Eran Segal1,2 Nucleosome organization is critical for gene regulation1. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers2, competition with site- specific DNA-binding proteins3, and the DNA sequence prefer- ences of the nucleosomes themselves4���8. However, it has been dif- ficult to estimate the relative importance of each of these mechanisms in vivo7,9���11, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experi- mentally by measuring the genome-wide occupancy of nucleo- somes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease- independent experiment that measures the relative affinity of nucleosomes for 40,000 double-stranded 150-base-pair oligonu- cleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly corre- lated with in vivo nucleosome occupancy in Caenorhabditis ele- gans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo. We sought to establish the extent to which the DNA sequence determines nucleosome organization in living cells. Our strategy, previously used by others for two yeast promoters12, was to compare in vivo nucleosome organization with that obtained by an in vitro assembly procedure using only purified nucleosomes and purified DNA. To obtain a genome-wide map of nucleosome occupancy governed solely by nucleosome sequence preferences, we purified chicken erythrocyte histone octamers and assembled them on puri- fied yeast genomic DNA by salt gradient dialysis13. We then isolated mononucleosomes by micrococcal nuclease digestion, and used parallel sequencing to determine nucleosome positions. We performed two independent experiments, resulting in ,10,000,000 DNA sequence reads that map uniquely to the yeast genome. For comparison to in vivo nucleosome positions, we isolated mononu- cleosomes from living cells5,7,9,10, and obtained ,25,000,000 sequence reads from 6 independent experiments. For each map, we deter- mined the average nucleosome occupancy at every base pair, calcu- lated as the log-ratio between the number of reads that cover that base pair and the genome-wide average coverage per base pair (see Methods). The nucleosome organizations of the in vitro and in vivo maps are notably similar, although not identical (Fig. 1), with a correlation of 0.74 between the nucleosome occupancy per base pair (Fig. 2a). On the scale of individual nucleosomes, the in vitro data separate regions that are enriched in nucleosomes in vivo from regions depleted of nucleosomes with high accuracy (Supplementary Fig. 1). Similarly, we found a significant correspondence between the positions of stable nucleosomes in the two maps (Supplementary Fig. 2). This high degree of similarity between the maps indicates that nucleosome sequence preferences have a dominant role in determining in vivo nucleosome organization. The correlation between the maps is not uniform across the gen- ome. We found a higher correlation between the maps at non-pro- moter intergenic regions located at ends of convergently transcribed genes (0.83) and a lower correlation at promoter (0.69) and coding (0.69) regions. In addition, the depletion level in vivo relative to that measured in vitro at coding regions increases with the expression level of the associated genes (Fig. 2b). These results indicate that transcrip- tion factors, chromatin regulators and active transcription influence the resulting nucleosome organization in vivo. Because the nucleosome organization in vitro is determined only by the DNA sequence, we asked whether we could derive rules that are predictive of nucleosome positioning and occupancy. For each of the 1,024 sequences of length 5 base pairs, we computed the average nucleosome occupancy of that sequence across all of its instances in the genome. We found a near perfect agreement (cor- relation of 0.98) between the average occupancy of these 5-base-pair sequences in vivo and in vitro (Fig. 3a). Many 5-base-pair sequences showed strong preferences for nucleosome-enriched or nucleosome- depleted regions. For example, AAAAA has the lowest average nucleosome occupancy both in vivo and in vitro, consistent with the reduced nucleosome affinity that poly(dA-dT) sequences have in vitro14, and with the nucleosome depletion observed over poly(dA- dT) sequences in vivo9,15. Consistent with previous reports4,5,11,16, we also found clear ,10-bp periodicities of dinucleotides along the nucleosome length, both in vitro and in vivo (Fig. 3b, c). Notably, the dynamic range of these periodicities is greater in vitro, suggesting that the fraction of nucleosomes positioned by these periodic motifs in vitro is greater than that in vivo. This difference may be due to the action of chromatin remodellers and transcription factors in vivo, which may cause nucleosomes to deviate from the locations dictated by the nucleosome sequence preferences. The higher 1 Department of Computer Science and Applied Mathematics, 2 Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel. 3 Department of Biochemistry, Molecular Biology, and Cell Biology, Northwestern University, 2153 Sheridan Road, Evanston, Illinois 60208, USA. 4 Department of Biology, Carolina Center for Genome Sciences, and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA. 5Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada. 6Agilent Technologies Inc., Genomics���LSSU, 5301 Stevens Creek Boulevard, MS 3L/MT Santa Clara, California 95051, USA. 7Terrence Donnelly Centre for Cellular & Biomolecular Research, 8Banting and Best Department of Medical Research, 160 College Street, Toronto, Ontario M5S 3E1, Canada. *These authors contributed equally to this work. Vol 458|19 March 2009|doi:10.1038/nature07667 362 Macmillan Publishers Limited. All rights reserved ��2009
concentration of nucleosomes in vivo relative to the concentration used to create our in vitro map may also contribute to this difference, because higher nucleosome concentrations generally increase the contribution of non-specific binding, thus diminishing the contri- bution of the ,10-bp sequence periodicities. Nevertheless, the conservation of the ,10-bp dinucleotide periodicities and the near-identity of 5-base-pair nucleosome occupancies demonstrate that nucleosomes have clear sequence preferences that are highly similar in vitro and in vivo. To test whether general sequence-based rules can be derived from our in vitro data and be used to predict nucleosome occupancy in vivo, we constructed a simple probabilistic model based on both the global preferences over sequences of length 5 and the position- dependent dinucleotide preferences5,17, which scores the nucleosome formation potential of every 147-bp sequence. Importantly, this model is learned only from in vitro nucleosome data, and therefore represents only nucleosome sequence preferences, whereas previous models5���8,18, which were learned from in vivo data, may also capture sequence preferences of other factors7, as well as indirect effects due to chromatin remodelling activities. We tested the model in a cross- validation scheme in which the nucleosome occupancy of each chro- mosome was predicted using a model that was constructed from the data from all other chromosomes. Our model has high correlations of 0.89 and 0.75 with the in vitro and in vivo maps, respectively (Fig. 3d, e), and separates nucleosome-enriched regions from nucleosome- depleted regions (Supplementary Fig. 3), indicating that the model successfully identified general predictive rules for the sequence pre- ferences of nucleosomes. If nucleosome sequence preferences are important in other eukar- yotes, then our model should also be predictive of their in vivo nucleosome organization. Indeed, we found a good (0.60) correlation between the nucleosome occupancy per base pair predicted by our SLA2 ATG2 ZWF1 NAR1 LAP3 ZWF1 ATG2 KEX2 YTP1 4 0 4 0 4 0 0 4 0 4 Genomic position Nucleosome occupancy Genomic position 2,000 bp 500 bp Chromosome 14: 187000���207000 Genes Model In vitro data YPD (in vivo) Ethanol (in vivo) Galactose (in vivo)Nucleosomeoccupancy Model In vitro data YPD (in vivo) Ethanol (in vivo) Galactose (in vivo) 4 4 4 4 4 0 0 0 0 0 Figure 1 | The intrinsic DNA-encoded nucleosome organization at a typical genomic region. Shown are the four different maps of nucleosome occupancy measured in this study for a typical 20,000-bp-long genomic region: the in vitro map, which reflects only the intrinsic nucleosome sequence preferences, and in vivo yeast maps for three different growth conditions (YPD, ethanol and galactose). Each track plots the measured nucleosome occupancy per base pair, computed by summing all of the nucleosome reads obtained in that experiment, and dividing that number by the average number of reads per base pair across the genome. The line of y 5 1 thus represents the genome-wide average and is shown as a dashed orange line. The average nucleosome occupancy predictions from our model are shown in blue. Number of base pairs R = 0.74 R = 0.33 14 10 8 6 4 2 2.5 ���2 ���2 0 0 12 Transcription level (log 2 ) 50,000 40,000 30,000 20,000 10,000 0 3 0 0 3 ���5 ���5 Normalized nucleosome occupancy in vivo (YPD) a b Difference in normalized nucleosome occupancy on coding regions (in vitro ��� in vivo) Normalized nucleosome occupancy in vitro Figure 2 | In vitro and in vivo maps are highly similar. a, Shown is a density dot plot comparison of the normalized nucleosome occupancy per base pair in the in vitro (x axis) and in vivo (y axis) maps (see Methods). Values above zero indicate nucleosome enrichment relative to the genome-wide average. The colour of each point represents the number of base pairs that map to that point in the graph. The Pearson correlation between the maps is indicated. b, Nucleosome depletion in vivo relative to in vitro over coding regions increases with the expression level of associated genes. Shown is a dot plot comparison between the expression level of every yeast gene (measured in ref. 26) and the difference between the average normalized nucleosome occupancy of the coding region of that gene in the in vitro map compared with the in vivo map (that is, higher values indicate larger nucleosome depletion in vivo relative to in vitro). The Pearson correlation of the dot plot is indicated. NATURE|Vol 458|19 March 2009 LETTERS 363 Macmillan Publishers Limited. All rights reserved ��2009