MSE

Massive
Storage Engine
2.0Evenmassiver!

Scaling issues with file and
malloc backends
● Built for web sites with gigabytes of
content
● Memory based backend is limited in size
● Allocation is unreliable under high
pressure
● File based has performance and
fragmentation issues

The problem with mmaps
1. Varnish has a cache miss
2. CPU writes to a page not in physical
memory
3. CPU creates a Page Fault
4. OS reads underlying page
5. Page gets overwritten (dumb)

Feature set
● Built for up to 100+ terabytes of storage
on each node.
● Fragmentation proof allocation algorithm
● Higher cache hit rates due to LRU
replaced with LFU
● Optionally persistent datastore

Architecture
● Threads lock allocated memory for
reliable allocations
● Multiple active segments for reduced lock
contention - round robin access
● “Hole expansion” to eliminate
fragmentation
● External persistence storage for high
performance

LRU Eviction
Least valuable object Most valuable object

Kill zone
The LFU approach to evictionMostvaluable
Leastvaluable

Free space
Kill candidates
Hole expansion

Adding persistence
● Feature requested much inVarnish Cache
1.0
● Feature request disappeared
● Came back with CDN andVOD
workloads

Persistence implementation:
“The Book”
● Mirror metadata structures to disk
● Kept in memory if persistence is disabled
● 2.5 copies on disk
● Journal is checksummed
● ~No blocking operations in critical path
(deletions can block)
● Delivery is unchanged

File layout
Store
Book (for persistence)
Segment
Header

Book layout
● A/B indices + working copy
● Journal
● Statistics (# of objects, free space)
● Ban journal (per book, not per segment)

Ordering
● Allocate storage / metadata (J)
● Write to store (J)
● Signal synchronous intention to store
● Asynchronous update of index based on
journal (J)
● Synchronize store (J)

Using persistence
● Storage initiated by mkfs.mse
● Book and Store can be on separate volumes
● Both need to be sized according to use -
store for size and book for # of objects
● Book size is 1-2% of store
● Bans and purges are atomic/consistent - the
rest isn’t

Results and characteristics
● Deployed on two public CDNs and a
number of private CDNs
● Performance is stable over time
● Handles larger ﬁles in the dataset well
● Persistence ads little overhead
● Crash recovery: ~reading the book
● Cache misses during book reading

Applications
● Video distribution
● CDN Workload
● Large caches (image banks, etc)

MSE

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à MSE

Similaire à MSE (20)

Plus de Varnish Software

Plus de Varnish Software (20)

Dernier

Dernier (20)

MSE