Sign up & Download
Sign in

The digitize Package: Extracting Numerical Data from Scatterplots

by Timothée Poisot
The R Journal (2011)
  • ISSN: 20734859

Abstract

I present the small R package digitize, designed to extract data from scatterplots with a simple method and suited to small datasets. I present an application of this method to the ex- traction of data froma graph whose source is not available.

Cite this document (BETA)

Available from journal.r-project.org
Page 1
hidden

The digitize Package: Extracting Numerical Data from Scatterplots

CONTRIBUTED RESEARCH ARTICLES 25
The digitize Package: Extracting
Numerical Data from Scatterplots
by Timothée Poisot
Abstract I present the small R package digitize,
designed to extract data from scatterplots with
a simple method and suited to small datasets. I
present an application of this method to the ex-
traction of data from a graph whose source is not
available.
The package digitize, that I present here, allows a
user to load a graphical file of a scatterplot (with the
help of the read.jpeg function of the ReadImages
package) in the graphical window of R, and to use
the locator function to calibrate and extract the data.
Calibration is done by setting four reference points
on the original graph axis, two for the x values and
two for the y values. The use of four points for cal-
ibration is justified by the fact that it makes calibra-
tions on the axis possible, as y data are not taken into
account for calibration of the x axis, and vice versa.
This is useful when working on data that are not
available in digital form, e.g. when integrating old
papers in meta-analyses. Several commercial or free
software packages allow a user to extract data from
a plot in image format, among which we can cite
PlotDigitizer (http://plotdigitizer.sourceforge.
net/) or the commercial package GraphClick (http:
//www.arizona-software.ch/graphclick/). While
these programs are powerful and quite ergonomic,
for some lightweight use, one may want to load the
graph directly into R, and as a result get the data di-
rectly in R format. This paper presents a rapid dig-
itization of a scatterplot and subsequent statistical
analysis of the data. As an example, we will use the
data presented by Jacques Monod in a seminal mi-
crobiology paper (Monod, 1949).
The original paper presents the growth rate (in
terms of divisions per hour) of the bacterium Es-
cherichia coli in media of increasing glucose concen-
tration. Such a hyperbolic relationship is best repre-
sented by the equation
R = RK
C
C1 + C
,
where R is the growth rate at a given concentration
of nutrients C, RK is the maximal growth rate, C1 is
the concentration of nutrients at which R = 0.5RK. In
R, this function is written as
MonodGrowth <- function(params, M) {
with(params, rK*(M/(M1+M)))
}
In order to characterize the growth parameters of
a bacterial population, one can measure its growth
rate in different concentrations of nutrients. Monod
(1949) proposed that, in the measured population,
RK = 1.35 and C1 = 22 106. By using the digitize
package to extract the data from this paper, we will
be able to get our own estimates for these parame-
ters.
Values of RK and C1 were estimated using a sim-
ple genetic algorithm, which minimizes the error
function (sum of squared errors) defined by
MonodError <- function(params, M, y) {
with(params,
sum((MonodGrowth(params, M)-y)^2))
}
Figure 1: Original figure, as obtained after pointing
and clicking the data. Calibration was made using
the points x1 = 1, x2 = 8, y1 = 0.5 and y2 = 1. All the
points that have been clicked are marked by a red
point. The digitization of each series is stopped by
right-clicking or pressing the Esc key.
The first step when using the digitize package is
to specify four points on the graph that will be used
to calibrate the axes. They must be in the following
order : leftmost x, rightmost x, lower y, upper y. For
the first two of them, the y value is not important
(and vice versa). For this example, it is assumed that
we set the first two points at x1 = 1 and x2 = 8, and
the two last points at y1 = 0.5 and y2 = 1, simply by
clicking in the graphical window at these positions
(preferentially on the axes). It should be noted that
it is not necessary to calibrate using the extremity of
the axes.
Loading the figure and printing it in the current
device for calibration is done by
cal <- ReadAndCal('monod.jpg')
Once the graph appears in the window, the user
must input (by clicking on them) the four calibration
points, marked as blue crosses. The calibration val-
ues will be stocked in the cal object, which is a list
with x and y values. The next step is to read the data,
The R Journal Vol. 3/1, June 2011 ISSN 2073-4859

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

23 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
39% Ph.D. Student
 
9% Assistant Professor
 
9% Other Professional
by Country
 
17% United Kingdom
 
13% Germany
 
9% United States