This talk presents an extensive experimental study that shows that a good general-purpose allocator is better than almost all commonly-used custom allocators, with one exception: regions (a.k.a., pools, arenas, zones). However, it shows that regions consume much more memory than necessary. The talk then introduces reaps (regions + heaps), which combine the flexibility and space efficiency of heaps with the performance of regions.
2. Custom Memory Allocation
Programmers replace Very common practice
new/delete, bypassing Apache, gcc, lcc, STL,
system allocator database servers…
Language-level
Reduce runtime – often
support in C++
Expand functionality – sometimes
Widely recommended
Reduce space – rarely
“Use custom
allocators”
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 2
3. Drawbacks of Custom Allocators
Avoiding system allocator:
More code to maintain & debug
Can’t use memory debuggers
Not modular or robust:
Mix memory from custom
and general-purpose allocators → crash!
Increased burden on programmers
Are custom allocators really a win?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 3
4. Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 4
5. (1) Per-Class Allocators
Recycle freed objects from a free list
a = new Class1; Class1
Fast
free list
b = new Class1; +
c = new Class1; Linked list operations
+
a
delete a; Simple
+
delete b;
Identical semantics
b +
delete c;
C++ language support
+
a = new Class1;
c
Possibly space-inefficient
b = new Class1; -
c = new Class1;
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 5
6. (II) Custom Patterns
Tailor-made to fit allocation patterns
Example: 197.parser (natural language parser)
db
a c
char[MEMORY_LIMIT]
end_of_array
end_of_array
end_of_array
end_of_array
end_of_array
a = xalloc(8); Fast
+
b = xalloc(16); Pointer-bumping allocation
+
c = xalloc(8);
- Brittle
xfree(b);
- Fixed memory size
xfree(c);
- Requires stack-like lifetimes
d = xalloc(8);
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 6
7. (III) Regions
Separate areas, deletion only en masse
regioncreate(r) r
regionmalloc(r, sz)
regiondelete(r)
- Risky
Fast
+
- Dangling
Pointer-bumping allocation
+
references
Deletion of chunks
+
- Too much space
Convenient
+
One call frees all memory
+
Increasingly popular custom allocator
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 7
8. Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 8
9. Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
Custom Win32
1.75
Normalized Runtime
non-regions regions
1.5
1.25
1
0.75
0.5
0.25
0
r
he
er
lle
ze
m
c
c
vp
gc
lc
rs
si
ud
ac
ee
5.
6.
d-
pa
m
17
ap
br
17
xe
7.
c-
bo
19
As good as and sometimes much faster than Win32
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 9
10. Not So Fast…
Runtime - Custom Allocator Benchmarks
Custom Win32 DLmalloc
1.75
non-regions regions
Normalized Runtime
1.5
1.25
1
0.75
0.5
0.25
0
lle
e
r
he
c
r
m
c
vp
e
lc
z
gc
si
ud
rs
ee
ac
5.
d-
6.
pa
m
br
17
ap
17
xe
7.
c-
bo
19
DLmalloc: as fast or faster for most benchmarks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 10
11. The Lea Allocator (DLmalloc 2.7.0)
Mature public-domain general-purpose
allocator
Optimized for common allocation patterns
Per-size quicklists ≈ per-class allocation
Deferred coalescing
(combining adjacent free objects)
Highly-optimized fastpath
Space-efficient
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 11
12. Space Consumption: Mixed Results
Space - Custom Allocator Benchmarks
Custom DLmalloc
1.75
non-regions regions
Normalized Space
1.5
1.25
1
0.75
0.5
0.25
0
lle
e
r
he
c
r
sim
c
vp
e
lc
z
gc
ud
rs
ee
ac
5.
d-
6.
pa
m
br
17
ap
17
xe
7.
c-
bo
19
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 12
13. Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 13
14. Regions – Pros and Cons
Fast, convenient, etc.
+
Avoid resource leaks (e.g., Apache)
+
Tear down memory for terminated connections
No individual object deletion
-
Unbounded memory consumption
(producer-consumer, long-running computations,
off-the-shelf programs)
Apache: vulnerable to DoS, memory leaks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 14
15. Reap Hybrid Allocator
Reap = region + heap
Adds individual object deletion & heap
reapcreate(r)
r
reapmalloc(r, sz)
reapfree(r,p)
reapdelete(r)
Can reduce memory consumption
+
Fast
+
Adapts to use (region or heap style)
+
Cheap deletion
+
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 15
16. Reap Runtime
Runtime - Custom Allocation Benchmarks
Custom Win32 DLmalloc Reap
1.75
Normalized runtime
non-regions regions
1.5
1.25
1
0.75
0.5
0.25
0
lle
e
r
he
c
r
im
c
vp
e
lc
z
gc
ud
rs
ee
-s
ac
5.
6.
pa
d
m
br
17
ap
17
xe
7.
c-
bo
19
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 16
17. Reap Space
Space - Custom Allocator Benchmarks
Custom DLmalloc Reap
1.75
non-regions regions
Normalized Space
1.5
1.25
1
0.75
0.5
0.25
0
lle
e
r
he
c
r
sim
c
vp
e
lc
z
gc
ud
rs
ee
ac
5.
d-
6.
pa
m
br
17
ap
17
xe
7.
c-
bo
19
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 17
18. Reap: Best of Both Worlds
Allows mixing of regions and new/delete
Case study:
New Apache module “mod_bc”
bc: C-based arbitrary-precision calculator
Changed 20 lines out of 8000
Benchmark: compute 1000th prime
With Reap: 240K
Without Reap: 7.4MB
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 18
19. Conclusions and Future Work
Empirical study of custom allocators
Lea allocator often as fast or faster
Non-region custom allocation ineffective
Reap: region performance without drawbacks
Future work:
Reduce space with per-page bitmaps
Combine with scalable general-purpose
allocator (e.g., Hoard)
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 19
21. If You Can Read This,
I Went Too Far
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 21
22. Backup Slides
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 22
23. Experimental Methodology
Comparing to general-purpose
allocators
Same semantics: no problem
E.g., disable per-class allocators
Different semantics: use emulator
Uses general-purpose allocator
Adds bookkeeping to support
region semantics
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 23
24. Why Did They Do That?
Recommended practice
Premature optimization
Microbenchmarks vs. actual performance
Drift
Not bottleneck anymore
Improved competition
Modern allocators are better
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 24
25. Reaps as Regions: Runtime
Runtime - Region-Based Benchmarks
Custom Win32 DLmalloc Reap
1.75
1.5
Normalized Runtime
1.25
1
0.75
0.5
0.25
0
lcc mudlle
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 25
26. Using Reap as Regions
Runtime - Region-Based Benchmarks
Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
2.5
2
Normalized Runtime
1.5
1
0.5
0
lcc mudlle
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 26
27. Drawbacks of Regions
Can’t reclaim memory within regions
Bad for long-running computations,
producer-consumer patterns,
“malloc/free” programs
unbounded memory consumption
Current situation for Apache:
vulnerable to denial-of-service
limits runtime of connections
limits module programming
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 27
28. Use Custom Allocators?
Strongly recommended by practitioners
Little hard data on performance/space
improvements
Only one previous study [Zorn 1992]
Focused on just one type of allocator
Custom allocators: waste of time
Small gains, bad allocators
Different allocators better? Trade-offs?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 28
29. Kinds of Custom Allocators
Three basic types of custom allocators
Per-class
Fast
Custom patterns
Fast, but very special-purpose
Regions
Fast, possibly more space-efficient
Convenient
Variants: nested, obstacks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 29
30. Optimization Opportunity
Time Spent in Memory Operations
Memory Operations Other
100
% of runtime
80
60
40
20
0
sim
ll e
lcc
ze
cc
e
e
pr
r
se
ag
h
ud
v
g
e
ac
d-
5.
ar
6.
re
er
m
ap
17
xe
17
p
b
Av
7.
c-
bo
19
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 30