Magazines and Vmem : Extending the Slab Allocator to Many CPUs and Arbitrary Resources
Structure (2001)
Available from citeseer.ist.psu.edu
or
Abstract
The slab allocator provides efficient object caching but has two significant limitations: its global locking doesn't scale to many CPUs, and the allocator can't manage resources other than kernel memory. To provide scalability we introduce a per-processor caching scheme called the magazine layer that provides linear scaling to any number of CPUs. To support more general resource allocation we introduce a new virtual memory allocator, vmem, which acts as a universal backing store for the slab...
Available from citeseer.ist.psu.edu
Page 1
Magazines and Vmem : Extending the Slab Allocator to Many CPUs and Arbitrary Resources
1. Introduction
The slab allocator [Bonwick94] has taken on a life of
its own since its introduction in these pages seven
years ago. Initially deployed in Solaris 2.4, it has
since been adopted in whole or in part by several other
operating systems including Linux, FreeBSD,
NetBSD, OpenBSD, EROS, and Nemesis. It has also
been adapted to applications such as BIRD and Perl.
Slab allocation is now described in several OS
textbooks [Bovet00, Mauro00, Vahalia96] and is part
of the curriculum at major universities worldwide.
Meanwhile, the Solaris slab allocator has continued to
evolve. It now provides per−CPU memory allocation,
more general resource allocation, and is available as a
user−level library. We describe these developments in
seven sections as follows:
§2. Slab Allocator Review. We begin with brief
review of the original slab allocator.
§3. Magazines: Per−CPU Memory Allocation. As
servers with many CPUs became more common and
memory latencies continued to grow relative to
processor speed, the slab allocator’s original locking
strategy became a performance bottleneck. We
addressed this by introducing a per−CPU caching
scheme called the magazine layer.
§4. Vmem: Fast, General Resource Allocation. The
slab allocator caches relatively small objects and relies
on a more general−purpose backing store to provide
slabs and satisfy large allocations. We describe a new
resource allocator, vmem, that can manage arbitrary
sets of integers − anything from virtual memory
addresses to minor device numbers to process IDs.
Vmem acts as a universal backing store for the slab
allocator, and provides powerful new interfaces to
address more complex resource allocation problems.
Vmem appears to be the first resource allocator that
can satisfy allocations and frees of any size in
guaranteed constant time.
§5. Vmem−Related Slab Allocator Improvements.
We describe two key improvements to the slab
allocator itself: it now provides object caching for any
vmem arena, and can issue reclaim callbacks to notify
clients when the arena’s resources are running low.
§6. libumem: A User−Level Slab Allocator. We
describe what was necessary to transplant the slab
allocator from kernel to user context, and show that
the resulting libumem outperforms even the current
best−of−breed multithreaded user−level allocators.
§7. Conclusions. We conclude with some observa−
tions about how these technologies have influenced
Solaris development in general.
Magazines and Vmem:
Extending the Slab Allocator to Many CPUs and Arbitrary Resources
Jeff Bonwick, Sun Microsystems
Jonathan Adams, California Institute of Technology
Abstract
The slab allocator [Bonwick94] provides efficient object caching but has two significant
limitations: its global locking doesn’t scale to many CPUs, and the allocator can’t manage
resources other than kernel memory. To provide scalability we introduce a per−processor
caching scheme called the magazine layer that provides linear scaling to any number of
CPUs. To support more general resource allocation we introduce a new virtual memory
allocator, vmem, which acts as a universal backing store for the slab allocator. Vmem is a
complete general−purpose resource allocator in its own right, providing several important
new services; it also appears to be the first resource allocator that can satisfy arbitrary−size
allocations in constant time. Magazines and vmem have yielded performance gains
exceeding 50% on system−level benchmarks like LADDIS and SPECweb99.
We ported these technologies from kernel to user context and found that the resulting
libumem outperforms the current best−of−breed user−level memory allocators. libumem also
provides a richer programming model and can be used to manage other user−level resources.
Page 2
2. Slab Allocator Review
2.1. Object Caches
Programs often cache their frequently used objects to
improve performance. If a program frequently
allocates and frees foo structures, it is likely to
employ highly optimized foo_alloc() and
foo_free() routines to “avoid the overhead of
malloc.” The usual strategy is to cache foo objects on
a simple freelist so that most allocations and frees take
just a handful of instructions. Further optimization is
possible if foo objects naturally return to a partially
initialized state before they’re freed, in which case
foo_alloc() can assume that an object on the
freelist is already partially initialized.
We refer to the techniques described above as object
caching. Traditional malloc implementations cannot
provide object caching because the malloc/free
interface is typeless, so the slab allocator introduced
an explicit object cache programming model with
interfaces to create and destroy object caches, and
allocate and free objects from them (see Figure 2.1).
The allocator and its clients cooperate to maintain an
object’s partially initialized, or constructed, state. The
allocator guarantees that an object will be in this state
when allocated; the client guarantees that it will be in
this state when freed. Thus, we can allocate and free
an object many times without destroying and
reinitializing its locks, condition variables, reference
counts, and other invariant state each time.
2.2. Slabs
A slab is one or more pages of virtually contiguous
memory, carved up into equal−size chunks, with a
reference count indicating how many of those chunks
are currently allocated. To create new objects the
allocator creates a slab, applies the constructor to each
chunk, and adds the resulting objects to the cache. If
system memory runs low the allocator can reclaim any
slabs whose reference count is zero by applying the
destructor to each object and returning memory to the
VM system. Once a cache is populated, allocations
and frees are very fast: they just move an object to or
from a freelist and update its slab reference count.
Figure 2.1: Slab Allocator Interface Summary
kmem_cache_t *kmem_cache_create(
char *name, /* descriptive name for this cache */
size_t size, /* size of the objects it manages */
size_t align, /* minimum object alignment */
int (*constructor)(void *obj, void *private, int kmflag),
void (*destructor)(void *obj, void *private),
void (*reclaim)(void *private), /* memory reclaim callback */
void *private, /* argument to the above callbacks */
vmem_t *vmp, /* vmem source for slab creation */
int cflags); /* cache creation flags */
Creates a cache of objects, each of size size, aligned on an align boundary. name identifies the cache
for statistics and debugging. constructor and destructor convert plain memory into objects and
back again; constructor may fail if it needs to allocate memory but can’t. reclaim is a callback
issued by the allocator when system−wide resources are running low (see §5.2). private is a
parameter passed to the constructor, destructor and reclaim callbacks to support parameterized
caches (e.g. a separate packet cache for each instance of a SCSI HBA driver). vmp is the vmem source
that provides memory to create slabs (see §4 and §5.1). cflags indicates special cache properties.
kmem_cache_create() returns an opaque pointer to the object cache (a.k.a. kmem cache).
void kmem_cache_destroy(kmem_cache_t *cp);
Destroys the cache and releases all associated resources. All allocated objects must have been freed.
void *kmem_cache_alloc(kmem_cache_t *cp, int kmflag);
Gets an object from the cache. The object will be in its constructed state. kmflag is either KM_SLEEP
or KM_NOSLEEP, indicating whether it’s acceptable to wait for memory if none is currently available.
void kmem_cache_free(kmem_cache_t *cp, void *obj);
Returns an object to the cache. The object must be in its constructed state.
2.1. Object Caches
Programs often cache their frequently used objects to
improve performance. If a program frequently
allocates and frees foo structures, it is likely to
employ highly optimized foo_alloc() and
foo_free() routines to “avoid the overhead of
malloc.” The usual strategy is to cache foo objects on
a simple freelist so that most allocations and frees take
just a handful of instructions. Further optimization is
possible if foo objects naturally return to a partially
initialized state before they’re freed, in which case
foo_alloc() can assume that an object on the
freelist is already partially initialized.
We refer to the techniques described above as object
caching. Traditional malloc implementations cannot
provide object caching because the malloc/free
interface is typeless, so the slab allocator introduced
an explicit object cache programming model with
interfaces to create and destroy object caches, and
allocate and free objects from them (see Figure 2.1).
The allocator and its clients cooperate to maintain an
object’s partially initialized, or constructed, state. The
allocator guarantees that an object will be in this state
when allocated; the client guarantees that it will be in
this state when freed. Thus, we can allocate and free
an object many times without destroying and
reinitializing its locks, condition variables, reference
counts, and other invariant state each time.
2.2. Slabs
A slab is one or more pages of virtually contiguous
memory, carved up into equal−size chunks, with a
reference count indicating how many of those chunks
are currently allocated. To create new objects the
allocator creates a slab, applies the constructor to each
chunk, and adds the resulting objects to the cache. If
system memory runs low the allocator can reclaim any
slabs whose reference count is zero by applying the
destructor to each object and returning memory to the
VM system. Once a cache is populated, allocations
and frees are very fast: they just move an object to or
from a freelist and update its slab reference count.
Figure 2.1: Slab Allocator Interface Summary
kmem_cache_t *kmem_cache_create(
char *name, /* descriptive name for this cache */
size_t size, /* size of the objects it manages */
size_t align, /* minimum object alignment */
int (*constructor)(void *obj, void *private, int kmflag),
void (*destructor)(void *obj, void *private),
void (*reclaim)(void *private), /* memory reclaim callback */
void *private, /* argument to the above callbacks */
vmem_t *vmp, /* vmem source for slab creation */
int cflags); /* cache creation flags */
Creates a cache of objects, each of size size, aligned on an align boundary. name identifies the cache
for statistics and debugging. constructor and destructor convert plain memory into objects and
back again; constructor may fail if it needs to allocate memory but can’t. reclaim is a callback
issued by the allocator when system−wide resources are running low (see §5.2). private is a
parameter passed to the constructor, destructor and reclaim callbacks to support parameterized
caches (e.g. a separate packet cache for each instance of a SCSI HBA driver). vmp is the vmem source
that provides memory to create slabs (see §4 and §5.1). cflags indicates special cache properties.
kmem_cache_create() returns an opaque pointer to the object cache (a.k.a. kmem cache).
void kmem_cache_destroy(kmem_cache_t *cp);
Destroys the cache and releases all associated resources. All allocated objects must have been freed.
void *kmem_cache_alloc(kmem_cache_t *cp, int kmflag);
Gets an object from the cache. The object will be in its constructed state. kmflag is either KM_SLEEP
or KM_NOSLEEP, indicating whether it’s acceptable to wait for memory if none is currently available.
void kmem_cache_free(kmem_cache_t *cp, void *obj);
Returns an object to the cache. The object must be in its constructed state.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
5 Readers on Mendeley
by Discipline
by Academic Status
40% Researcher (at a non-Academic Institution)
20% Student (Bachelor)
20% Professor
by Country
40% United States
20% China
20% Japan


