Binning high-resolution data

  • Krzywinski M
N/ACitations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

NATURE METHODS | VOL.13 NO.6 | JUNE 2016 | 463 THIS MONTH central tendency (median, average), extrema (minimum, maximum) and spread (s.d., interquartile range) (Fig. 3), or by highlighting global extrema or outliers (track d in Fig. 2 and tracks e–g in Fig. 3). Limitations in print resolution and visual acuity impose limits on data density and detail. The size of features in genomic data sets span many orders of magni-tude, and it is a challenge to draw elements in a figure small enough to preserve detail but large enough to be visible. In a previous column 1 , strategies were identified to present genomic data in context 2,3 . This month we look at methods to bin high-density information and pro-vide guidelines for the minimum size of elements in a figure. Visual acuity imposes stricter limits than output resolution. A com-mon unit of length in print is the point (pt; 1 pt = 1/72 inch). The resolving power of the eye is about 1/4 pt at a distance of 30 cm, and many journals impose a 1/4-pt or 1/2-pt minimum line width for fig-ures. Although it is possible to discern 1/4-pt lines that are 1/4 pt apart (Fig. 1a), such fine detail can overwhelm the eye. We suggest lines at least 1/2 pt in width that are no closer together than 3/4 pt (Fig. 1b). A size of at least 1 pt is needed to resolve the color of small elements, and to comfortably assess differences in adjacent heights (Figs. 1 and 2). When 1/2-pt line widths are used for axes and grids, a 1-pt line thickness for data traces is suggested, and symbols in line plots should be no smaller than 3 pt (Fig. 1). In any context, data traces should use symbols no finer than 1.5 pt on a 1/2-pt line. For scatter plots of high density, when large points can occlude each other, or if outliers are shown in a distinct visual channel, data points can be as small as 1 pt. These requirements inform the extent of binning required for dense data tracks. Figure 2 demonstrates the visibility of binned data for bins of 1/4 to 2 pt. Finding local maxima is relatively easy even with 1/4-pt bins, but judging the average, assessing variability and discern-ing minima are difficult with bins smaller than 1 pt. Histograms are preferred over heat maps, except where space is an issue—heat maps can be more compact and effective for sparse data (track d, Fig. 2). We suggest not binning data into more than ~250 intervals for one-column figures (3.5 inches wide) or ~500 intervals for two-column figures (7.2 inches). This corresponds roughly to 1 pt in print, 4 pixels on a high-resolution screen or 2 pixels on a typical LCD projector. The limit on bin size reduces detail and smoothes out variation—for exam-ple, a full-page figure of human chromosome 1 requires bins of 500 kb (~50 times the average gene size). One can mitigate this by encoding a b c d e f g 1-pt bins 2-pt bins 3-pt bins Figure 1 | Visual-acuity limits impose a minimum size on elements. (a) Lines thinner than 1/2 pt cannot be comfortably resolved if less than 1/2 pt apart. (b) Differences in tone, length and color are difficult to judge for elements smaller than 1 pt. (c) Data points should be at least three times the width of their line. White circles are shown with a 1/4-pt outline. Bins (pt)

Cite

CITATION STYLE

APA

Krzywinski, M. (2016). Binning high-resolution data. Nature Methods, 13(6), 463–463. https://doi.org/10.1038/nmeth.3873

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free