SQL Database Design For Developers at php[tek] 2024
XS Boston 2008 Memory Overcommit
1. <Insert Picture Here>
Memory Overcommit…
without the Commitment
Speaker: Dan Magenheimer
2008 Oracle Corporation
2. Overview
• What is overcommitment?
• aka oversubscription or overbooking
• Why (and why not) overcommit memory?
• Known techniques for memory overcommit
• Feedback-directed ballooning
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
3. CPU overcommitment
Four underutilized 2-cpu virtual servers
One 4-CPU physical
server
Xen supports CPU
overcommitment
(aka “consolidation”)
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
4. I/O overcommitment
Four underutilized 2-cpu virtual servers
each with a 1Gb NIC
One 4-CPU physical
server with a 1Gb NIC
Xen supports I/O
overcommitment
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
5. Memory overcommitment???
Four underutilized 2-cpu virtual servers
each with 1GB RAM
One 4-CPU physical
server w/4GB RAM
X
X
SORRY! Xen doesn’t support
memory overcommitment!
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
6. Why doesn’t Xen overcommit
memory?
• Memory is cheap – buy more
• “When you overbook memory excessively, performance takes a hit”
• Most consolidated workloads don’t benefit much
• Overcommit requires swapping – and Xen doesn’t do I/O in the
hypervisor
• Overcommit adds lots of complexity and latency to important
features like save/restore/migration
• Operating systems know what they are doing and can use all the
memory they can get
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
7. Why doesn’t Xen overcommit
memory?
• Memory is cheap – buy more… except when you’re out of slots or need BIG dimms.
• “When you overbook memory excessively, performance takes a
hit”… yes, but true of overbooking CPU and I/O too. Sometimes tradeoffs have to be made.
• Most consolidated workloads don’t benefit much…but some workloads do!
• Overcommit requires swapping – and Xen doesn’t do I/O in the
hypervisor…only if black-box swapping is required
• Overcommit adds lots of complexity and latency to important
features like save/restore/migration… some techniques maybe… we’ll see…
• Operating systems know what they are doing and can use all the
memory they can get…but an idle or lightly-loaded OS may not!
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
8. Why should Xen support memory
overcommitment?
• Competitive reasons
“VMware Infrastructure’s exclusive ability to overcommit memory gives it an
advantage in cost per VM that others can’t match” *
• High(er) density consolidation can save money
• Sum of guest working sets is often smaller than
available physical memory
• Inefficient guest OS utilization of physical memory
(cacheing vs “hoarding”)
* E. Horschman, Cheap Hypervisors: A Fine Idea -- If you can afford them, blog posting,
http://blogs.vmware.com/virtualreality/2008/03/cheap-hyperviso.html
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
9. Problem statement
Oracle OnDemand businesses
(both internal/external):
• would like to use Oracle VM (Xen-based)
• but uses memory overcommit extensively
The Oracle VM team was asked… can we:
• implement memory overcommit on Xen?
• get it accepted upstream?
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
10. Memory Overcommit Investigation
• Technology survey
• understand known techniques and implementations
• understand what Xen has today and its limitations
• Propose a solution
• OK to place requirements on guest
• e.g. black-box solution unnecessary
• soon and good is better than late and great
• phased delivery OK if necessary
• e.g. Oracle Enterprise Linux now, Windows later
• preferably high bang for the buck
• e.g. 80% of value with 20% of cost
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
11. Techniques for memory overcommit
• Ballooning
• Content-based page sharing
• VMM-driven demand paging
• Hot-plug memory add/delete
• Ticketed ballooning
• Swapping entire guests
Black-box or gray-box*… or white-box?
* T. Wood, et al. Black-box and Gray-box Strategies for Virtual Machine Migration, In Proceedings NSDI ’07.
http://www.cs.umass.edu/~twood/pubs/NSDI07.pdf
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
12. WHAT IF…..?
Operating systems were able to:
• recognize when physical memory is not being used
efficiently and communicate relevant statistics
• surrender memory when it is underutilized
• reclaim memory when it is needed
And Xen/domain0 could balance the allocation of
physical memory, just as it does for CPU/devices?
…. Maybe this is already possible?!?
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
13. Ballooning (gray-box) Currently implemented by:
• In-guest device driver
• “steals” / reclaims memory via guest in-kernel APIs
• e.g. get_free_page() and MmAllocatPagesforMdl()
KVM
• Balloon inflation increases guest memory pressure
• leverages guest native memory management algorithms
• Xen has ballooning today
• mostly used for domain0 autoballooning
• has problems, but recent patch avoids worst*
• Vmware and KVM have it today too
Issues:
• driver must be installed
• not available during boot
• reclaim may not be fast enough; potential out-of-memory conditions
* J. Beulich, [PATCH] linux/balloon: don’t allow ballooning down a domain below a reasonable limit,, Xen developers
archive, http://lists.xensource.com/archives/html/xen-devel/2008-04/msg00143.html
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
14. (black-box)
Content-based page sharing
• One physical page frame used for multiple identical pages
• sharing works both intra-guest and inter-guest
• hypervisor periodically scans for copies and “merges”
• copy-on-write breaks share Currently implemented by:
• Investigated on Xen, but never in-tree* **
• measured savings of 4-12%
KVM
• Vmware*** has had for a long time, KVM soon ****
Issues:
• Performance cost of discovery scans, frequent share set-up/tear-down
• High complexity for relatively low gain
* J. Kloster et al, On the feasibility of memory sharing: content based page sharing in the xen virtual machine monitor,
Technical Report, 2006. http://www.cs.aau.dk/library/files/rapbibfiles1/1150283144.pdf
** G. Milos, Memory COW in Xen, Presentation at Xen Summit, Nov 2007.
http://www.xen.org/files/xensummit_fall2007/18_GregorMilos.pdf
*** C. Waldspurger. Memory Resource Management in Vmware ESX Server, In Proceedings OSDI’02,
http://www.usenix.org/events/osdi02/tech/walspurger/waldspurger.pdf
**** A. Kivity, Memory overcommit with kvm, http://avikivity.blogspot.com/2008/04/memory-overcommit-with-kvm.html
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
15. Demand paging (black-box) Currently implemented by:
KVM
• VMM reclaims memory and swaps to disk
• VMware has today
• used as last resort
• randomized page selection
• Could potentially be done on Xen via
domain0
Issues:
• “Hypervisor” must have disk/net drivers
• “Semantic gap”* Double paging
* P. Chen et al. When Virtual is Better than Real. In Proceedings HOTOS ’01. http://www.eecs.umich.edu/~pmchen/papers/chen01.pdf
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
16. Hotplug memory add/delete (white-box)
Currently implemented by:
• Essentially just ballooning with:
• larger granularity
• less fragmentation
• potentially unlimited maximum memory
• no kernel data overhead for unused pages
Issues:
• Not widely available yet (for x86)
• Larger granularity
• Hotplug delete requires defragmentation
J. Schopp et al, Resizing Memory with Balloons and Hotplug, Ottawa Linux Symposium 2006,
https://ols2006.108.redhat.com/reprints/schopp-reprint.pdf
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
17. “Ticketed” ballooning (white-box)
• Proposed by Ian Pratt*
• A ticket is obtained when a page is surrendered to
the balloon driver
• Original page can be retrieved if Xen hasn’t given
the page to another domain
• Similar to a system-wide second-chance buffer
cache or unreliable swap device
• Never implemented (afaik)
* http://lists.xensource.com/archives/html/xen-devel/2008-05/msg00321.html
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
18. Whole-guest swapping (black?-box)
• Proposed by Keir Fraser*
• Forced save/restore of idle/low-priority guests
• Wake-on-LAN-like mechanism causes restore
• Never implemented (afaik)
Issues:
• Very long latency for guest resume
• Very high system I/O overhead when densely overcommitted
* http://lists.xensource.com/archives/html/xen-devel/2005-12/msg00409.html
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
19. Observations
• Xen balloon driver works well
• recent patch avoids O-O-M problems
• works on hvm if pv-on-hvm drivers present
• ballooning up from “memory=xxx” to “maxmem=yyy” works (on
pvm domains)
• ballooned-down domain doesn’t restrict creation of new domains
• Linux provides lots of memory-status information
• /proc/meminfo and /proc/vmstat
• Committed_AS is a decent estimator of current memory need
• Linux does OK when put under memory pressure
• rapid/frequent balloon inflation/deflation just works… as long as
remaining available Linux memory is not too small
• properly configured Linux swap disk works when necessary;
obviates need for “system-wide” demand paging
• Xenstore tools work for two-way communication even in hvm
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
20. Proposed Solution:
Feedback-directed ballooning (gray-box)
Use relevant Linux memory statistics to control balloon size
• Selfballooning:
• Local feedback loop; immediate balloon changes
• Eagerly inflates balloon to create memory pressure
• No management or domain0 involvement
• Directed ballooning:
• Memory stats fed from each domainU to domain0
• Policy module in domain0 determines balloon size, controls
memory pressure for each domain (not yet implemented)
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
21. Implementation:
Feedback-directed ballooning
• No changes to Xen or domain0 kernel or drivers!
• Entirely implemented with user-land bash scripts
• Self-ballooning and stat reporting/monitoring only (for now)
• Committed_AS used (for now) as memory estimator
• Hysteresis parameters -- settable to rate-limit balloon changes
• Minimum memory floor enforced to avoid O-O-M conditions
• same maxmem-dependent algorithm as recent balloon driver bugfix
• Other guest requirements:
• Properly sized and configured swap (virtual) disk for each guest
• HVM: pv-on-hvm drivers present
• Xenstore tools present (but not for selfballooning)
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
22. Feedback-directed Ballooning
Results
• Overcommit ratio
• 7:4 w/default configuration (7 512MB loaded guests, 2GB phys memory)
• 15:4 w/aggressive config (15 512MB idle guests, 2GB phys memory)
• for pvm guests, arbitrarily higher due to “maxmem=”
• Preliminary performance (Linux kernel make after make clean, 5 runs, mean of middle 3)
ballooning Memory Min User Sys Elapsed Major page Down
27%
(MB) (sec) (sec) faults
(MB) (sec) Hysteresis
slower
Off 785 121 954 0
2048
Self 368 778 104 1209 8940 10
Off 775 95 1120 4650
1024
Self 238 775 93 1201 9490 10
Off 775 79 1167 8520
512
3%
self 172 775 80 1202 9650 10
slower
Off 773 79 1186 9150
256
Self 106 775 80 1202 9650 10
Selfballooning costly for large-memory domains but
barely noticeable for smaller-memory domains
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
23. Domain0 screenshot with monitoring tool and xentop showing
memory overcommitment
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
24. Future Work
• Domain0 policy module for directed ballooning
• some combination of directed and self-ballooning??
• Improved feedback / heuristics
• Combine multiple memory statistics, check idle time
• Prototype kernel changes (“white-box” feedback)
• Better “idle memory” metrics
• Benchmarking in real world
• More aggressive minimum memory experiments
• Windows support
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
25. Conclusions
• Xen does do memory overcommit today!
• Memory overcommit has some performance impact
• but still useful in environments where high VM density is more
important than max performance
• Lots of cool research directions possible for
“virtualization-aware” OS memory management
Memory Overcommit without the Commitment (Xen Summit 2008) - Dan Magenheimer
26. <Insert Picture Here>
Memory Overcommit…
without the Commitment
Speaker: Dan Magenheimer
2008 Oracle Corporation