3. Scaling issues with file and
malloc backends
● Built for web sites with gigabytes of
content
● Memory based backend is limited in size
● Allocation is unreliable under high
pressure
● File based has performance and
fragmentation issues
4. The problem with mmaps
1. Varnish has a cache miss
2. CPU writes to a page not in physical
memory
3. CPU creates a Page Fault
4. OS reads underlying page
5. Page gets overwritten (dumb)
5. Feature set
● Built for up to 100+ terabytes of storage
on each node.
● Fragmentation proof allocation algorithm
● Higher cache hit rates due to LRU
replaced with LFU
● Optionally persistent datastore
6. Architecture
● Threads lock allocated memory for
reliable allocations
● Multiple active segments for reduced lock
contention - round robin access
● “Hole expansion” to eliminate
fragmentation
● External persistence storage for high
performance
10. Adding persistence
● Feature requested much inVarnish Cache
1.0
● Feature request disappeared
● Came back with CDN andVOD
workloads
11. Persistence implementation:
“The Book”
● Mirror metadata structures to disk
● Kept in memory if persistence is disabled
● 2.5 copies on disk
● Journal is checksummed
● ~No blocking operations in critical path
(deletions can block)
● Delivery is unchanged
13. Book layout
● A/B indices + working copy
● Journal
● Statistics (# of objects, free space)
● Ban journal (per book, not per segment)
14. Ordering
● Allocate storage / metadata (J)
● Write to store (J)
● Signal synchronous intention to store
● Asynchronous update of index based on
journal (J)
● Synchronize store (J)
15. Using persistence
● Storage initiated by mkfs.mse
● Book and Store can be on separate volumes
● Both need to be sized according to use -
store for size and book for # of objects
● Book size is 1-2% of store
● Bans and purges are atomic/consistent - the
rest isn’t
16. Results and characteristics
● Deployed on two public CDNs and a
number of private CDNs
● Performance is stable over time
● Handles larger files in the dataset well
● Persistence ads little overhead
● Crash recovery: ~reading the book
● Cache misses during book reading