HAT: Hypergeometric Analysis of Tiling-arrays with application to promoter-GeneChip data

6Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., contiguous probes with increased signal intensity (as a result of hybridization of labeled DNA) in a region. Currently, no standard criteria are available to define these regions-of-interest as there is no single probe intensity cut-off level, different regions-of-interest can contain various numbers of probes, and can vary in genomic width. Furthermore, the chromosomal distance between neighboring probes can vary across the genome among different arrays.Results: We have developed Hypergeometric Analysis of Tiling-arrays (HAT), and first evaluated its performance for tiling-array datasets from a Chromatin Immunoprecipitation study on chip (ChIP-on-chip) for the identification of genome-wide DNA binding profiles of transcription factor Cebpa (used for method comparison). Using this assay, we can refine the detection of regions-of-interest by illustrating that regions detected by HAT are more highly enriched for expected motifs in comparison with an alternative detection method (MAT). Subsequently, data from a retroviral insertional mutagenesis screen were used to examine the performance of HAT among different applications of tiling-array datasets. In both studies, detected regions-of-interest have been validated with (q)PCR.Conclusions: We demonstrate that HAT has increased specificity for analysis of tiling-array data in comparison with the alternative method, and that it accurately detects regions-of-interest in two different applications of tiling-arrays. HAT has several advantages over previous methods: i) as there is no single cut-off level for probe-intensity, HAT can detect regions-of-interest at various thresholds, ii) it can detect regions-of-interest of any size, iii) it is independent of probe-resolution across the genome, and across tiling-array platforms and iv) it employs a single user defined parameter: the significance level. Regions-of-interest are detected by computing the hypergeometric-probability, while controlling the Family Wise Error. Furthermore, the method does not require experimental replicates, common regions-of-interest are indicated, a sequence-of-interest can be examined for every detected region-of-interest, and flanking genes can be reported. © 2010 Taskesen et al; licensee BioMed Central Ltd.

Figures

  • Figure 1 Illustration of the method. The different steps of the method, illustrated as blocks (A, B, C, D and E), are needed to process raw probe-intensity data, detection of unique candidate regions and mapping of the detected regions-of-interest to the 5' transcriptional start site of nearby located genes. HAT is indicated with the blocks B, C, D and E. These are representative for the detection of unique candidate regions-of-interest in single, as well as multiple samples.
  • Figure 2 Venn diagram depiction the overlapping regions-of-interest between HAT, Starr and MAT. Detected regions-of-interest by HAT (blue: 856), Starr (red: 1664) and MAT (green: 4784) are indicated with the number of overlapping regions between the methods. The overlap of regions detected by all three methods (pink: 719) showed high enrichment for CEBP binding motifs. Overlapping regions between HAT and MAT (64: blue) and Starr and MAT (orange: 652) also showed high enrichment for CEBP binding motifs. Uniquely detected regions by Starr (red: 70) showed no significantly enriched motifs, and MAT (green: 3092) showed limited motifs enriched for CEBP. Note that the number of overlapping regions can contain multiple regions-of-interest detected by a single method.
  • Table 1: Motif enrichment analysis.
  • Figure 3 Graphical output of a detected region-of-interest from the cebpa-study. It was confirmed with qPCR that the Cebpa protein targets and regulates the proximal promoter region of the il-6 receptor alpha gene, which lies downstream of the region-of-interest (negative DNA strand). The top panel (A), indicates the probes, represented as vertical blue lollipops, the left y-axis the probe-intensities, and the right y-axis illustrates the contribution of each probe separately to the region (probe-significance). The x-axis indicates the genomic probe positions, and illustrates with a downwards facing green bar; the sequence-of-interest. The sequence, 'CCAAT', was found on the negative DNA strand. Furthermore, flanking genes to this detected region are indicated with distances in base pairs to the 5' transcriptional start site. In the bottom panel (B), the detected regions-ofinterest for various windows and probes are shown. The colors represent the detection of regions-of-interest, for a number of different top probes and window sizes. The merged region-of-interest has a fragment width of 853 bp, and lies in the proximal promoter region of il6ra on the negative DNA strand.
  • Table 2: HAT: Motif enrichment analysis using α = 0.05.
  • Figure 4 Graphical output of a detected cmVIS in the MeDIP-study. A region-of-interest detected in two samples, is illustrated in Panels A and B. Panel A shows 840 sub-regions that are merged with a total length of 1567 bp. The restriction sites, indicated as green bars, are located in and around the detected region, and are present on both DNA strands due to the palindrome sequence: 'GATC'. The region-of-interest detected in the second tumor (Panel B), exists of 28 subregions, with a fragment width of 949 bp.
  • Figure 5 Graphical output of a detected and validated mVIS in the MeDIP-study. Panel A illustrates the detected mVIS which are subject to DNA methylation. Only a section of the detected region-of-interest has an increased probe-intensity; the probe-significance signifying this subregion. Directly beside the increased probe-significance, a restriction cleavage site is indicated by means of a green bar. Due to the palindrome sequence, these sites are indicated at the same genomic position on both DNA strands. Panel B shows the detected statistically significant regions among the different thresholds, and window sizes with various colors. A schematic representation of the amplified genomic region, with the virus- and the murine contribution, is shown in Panel C.
  • Figure 6 Schematic depiction for the detection of regions-of-interest. Schematic depiction for the detection of regions-of-interest, based on probe-intensities. Eight probes, with their genomic location, are shown in Panel A. Four of these have positive probe-intensities. The use of multiple thresholds, transforms continuous data into discrete data; as shown in Panel B and E. Various window scales N, are used to examine neighboring probes for their probe-intensities in Panel C and F. These windows will contain different number of positive probes. The hypergeometric probability is computed for every region-of-interest, and excludes a region-of-interest when the region is not statistically significant after correcting for a single positive probe in a region-of-interest and multiple testing. The remaining regions are merged for each k(t) (illustrated in Panel D, G, H) and then among all k(t) to a single region-of-interest (Panel I). To determine how often probes were detected in statistically significant regions, the probe-significance is computed (Panel D and E), and indicated with a red colored line that signifies the statistically significant probes in the detected region-of-interest.

References Powered by Scopus

A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

6746Citations
N/AReaders
Get full text

Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells

1499Citations
N/AReaders
Get full text

CCAAT/enhancer-binding proteins: Structure, function and regulation

1180Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Retroviral integration mutagenesis in mice and comparative analysis in human AML identify reduced PTP4A3 expression as a prognostic indicator

20Citations
N/AReaders
Get full text

Detection of differentially expressed segments in tiling array data

10Citations
N/AReaders
Get full text

Computational identification of anthocyanin-specific transcription factors using a rice microarray and maximum boundary range algorithm

10Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Taskesen, E., Beekman, R., de Ridder, J., Wouters, B. J., Peeters, J. K., Touw, I. P., … Delwel, R. (2010). HAT: Hypergeometric Analysis of Tiling-arrays with application to promoter-GeneChip data. BMC Bioinformatics, 11. https://doi.org/10.1186/1471-2105-11-275

Readers over time

‘10‘11‘12‘13‘14‘15‘16‘18‘19‘2002468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 9

38%

Researcher 8

33%

Professor / Associate Prof. 4

17%

Lecturer / Post doc 3

13%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 14

61%

Medicine and Dentistry 4

17%

Computer Science 3

13%

Biochemistry, Genetics and Molecular Bi... 2

9%

Article Metrics

Tooltip
Mentions
References: 1

Save time finding and organizing research with Mendeley

Sign up for free
0