Presentation

A Comparative Analysis of Nested
Virtualization in x86_64 and S390x
Hypervisors
Masters Termination Project
Daniel FitzGerald
Dr. Dennis Foreman

Overview
●
Introduction
●
Experimental Setup
●
Results
● Conclusion
●
Future Work

Introduction
●
Background
– What is “Nested Virtualization”
– Why should I care?
– What is “Turtles”?
– What is “VM”?
●
Purpose of our research

Experimental Setup
●
Problem: Comparing apples-to-apples
– Mainframes are bigger than desktop servers
– Needed to make both have similar resources
●
Solution: Abstraction and partitioning
– Logical Partitioning

Experimental Setup
●
Three test configurations
●
Each named by its “degree of virtualization”
– Levels of virtualization on the system
– “Level 0” - non-virtualized environment
– “Level 1” - single hypervisor
– “Level 2” - nested virtualization

Experimental Setup
●
Software
– GCC 4.8.3
– SysBench 0.5
– KVM, QEMU, and friends
●
Custom Linux kernel
– Can't have any large page support!

Experimental Setup
●
Creating L1 and L2 environments
●
S390x
– Install new z/VM at L0
– Reuse L0 Linux installation for L1 and L2
● X86_64
– Create qcow2 disk image for L1
– Install/configure Linux on disk image
– Copy and tweak disk image for L2

Experimental Setup
●
SysBench 0.5
– Each test executes a number of transactions,
– “Transaction” is some discreet computational
operation
– Focused on five different tests to get a good
idea of overall system performance

Experimental Setup
●
Goal was to maximize resource usage
●
Problem: resource over-commitment
– Guest VMs and host VMM fighting for the
same resources
– Leads to resource contention
– Dispatched to wait queue, paged to disk

Experimental Setup
●
Solution: Second set of tests performed
without resource over-commitment
● L0 – 16GB RAM, 4 CPU
● L1 – 14GB RAM, 3 VCPU
●
L2 – 12GB RAM, 2 VCPU

Results
●
Experimental Measurements
●
Definition of “Performance”
●
Performance Comparisons
– CPU Performance Comparison
– Thread Scheduling Performance Comparison
– Memory Write Performance Comparison
– Memory Read Performance Comparison
– MySQL Database Performance Comparison

Experimental Measurements
●
Transactional Throughput
– “How much work gets done”
● Application Response Time
– How long it takes the application to respond
to a request for service

Definition of “Performance”
●
With regards to our two measurements
●
“Better” performance
– Higher throughput
– Faster (lower) response times
● “Worse” performance
– Lower throughput
– Slower (higher) response times

CPU Performance Comparison
Environment →
Configuration →
X86_64s390x
L2L1L0L2L1L0
1400
1200
1000
800
600
400
200
0
EventThroughput(events/s)
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
1400
1200
1000
800
600
400
200
0
SysBenchCPUPerformanceComparison
Vertical Comparison:ConfigurationsbyEnvironment
Individual standarddeviationswereusedto calculatetheintervals.
95%ConfidenceInterval for theMean
Horizontal Comparison:EnvironmentsbyConfiguration
Individual standarddeviationswereusedtocalculatetheintervals.
●
x86_64 has greater throughput, but it decreases
●
S390x throughput doesn't change much at all:
●
20.7% probability that L0 and L1 throughput means are identical
●
82.9% probability that L1 and L2 throughput means are identical

Mean Response Time
Environment→
Configuration →
X86_64s390x
L2L1L0L2L1L0
4
3
2
1
0
ResponseTime(ms)
Configuration →
Environment →
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
4
3
2
1
0
ResponseTime(ms)
Mean ResponseTimeValue
Individual standarddeviationswereusedtocalculatethe intervals.
95%ConfidenceInterfal for theMean
●
X86_64 response times are faster, but they increase
●
S390x response time doesn't change much at all
●
16.7% probability that L0 and L1 response time means are identical
●
79.4% probability that L1 and L2 response time means are identical

No Memory or CPU Over-commitment
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
1400
1200
1000
800
600
400
200
0
Configuration →
Environment →
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
4
3
2
1
0
ResponseTime(ms)
Transactional Throughput - No ResourceOvercommitment
Mean ResponseTimeValue- No ResourceOvercommitment
● Removing over-commitment greatly reduced the variation on S390x
●
Transactional throughput scales with number of processors

●
x86_64 performs better than S390x
●
x86_64 performance is impacted by each
level of virtualization (KVM)
● S390x performance apparently has no such
impact from z/VM
●
CPU performance on z/VM scales better

Thread Scheduling Comparison
Environment→
Configuration →
X86_64s390x
L2L1L0L2L1L0
5000
4000
3000
2000
1000
0
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
5000
4000
3000
2000
1000
0
SysBenchThreadSchedulingComparison
●
Without virtualization, x86_64 has higher throughput
●
S390x has higher throughput in all virtualized configurations

Mean Response Time
Environment→
Configuration →
X86_64s390x
L2L1L0L2L1L0
50
40
30
20
10
0
ResponseTime(ms)
Configuration →
Environment →
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
50
40
30
20
10
0
ResponseTime(ms)
●
Without virtualization, x86_64 has faster response time
●
S390x has faster response time in all virtualized configurations
● x86_64 response time scales poorly with degree of virtualization

Configuration →
Environment →
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
4000
3000
2000
1000
0
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
90
80
70
60
50
40
30
20
10
0
ResponseTime(ms)
Transactional Throughput - No Overcommitment
Mean ResponseTimeValue- No Overcommitment
●
Removing over-commitment reduced the variation
●
S390x L1 performance “jumps” are eliminated
● S390x thread scheduling throughput and response time scale better

●
Performance decreases with increasing
degree of virtualization
● x86_64 hardware advantages erased by KVM
● z/VM provides better thread scheduling
performance than KVM
●
Thread scheduling on S390x and z/VM scales
better than on x86_64 and KVM

Memory Write Comparison
Environment→
Configuration →
X86_64s390x
L2L1L0L2L1L0
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Configuration →
Environment →
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
1.2
1.0
0.8
0.6
0.4
0.2
0.0
SysBenchMemoryWriteComparison
●
No virtualization: S390x has higher throughput than x86_64
●
Throughput increases on L1 z/VM, decreases on L1 KVM
● Big throughput decrease on L2 z/VM, gradual decrease on L2 KVM

Mean Response Time
Environment →
Configuration →
X86_64s390x
L2L1L0L2L1L0
25000
20000
15000
10000
5000
0
ResponseTime(ms)
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
25000
20000
15000
10000
5000
0
ResponseTime(ms)
●
No virtualization: S390x has faster response time than x86_64
●
Response time improves on L1 z/VM, degrades on L1 KVM
●
Nested virtualization causes L2 z/VM response time to slow, L2 KVM
response time to improve. Unexpected result.

●
Over-commitment was cause of L2 discrepancy
●
Variation on KVM all but eliminated
●
S390x L0, L1 very close: 70.5% probability that means are identical
●
x86_64 response time slows from L1 to L2
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
6000
5000
4000
3000
2000
1000
0
ResponseTime(ms)
Individual standarddeviationswere usedto calculatetheintervals.

●
L1 z/VM performance comparable to HW
●
L2 z/VM performance significantly degrades
– Throughput halved
– Response time more than doubled
● x86_64: gradual but consistent degradation
● z/VM has overall better performance, but
KVM performance changes are more
“predictable”

Memory Read Comparison
Environment→
Configuration →
X86_64s390x
L2L1L0L2L1L0
1.0
0.8
0.6
0.4
0.2
0.0
Configuration →
Environment →
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
1.0
0.8
0.6
0.4
0.2
0.0
SysBenchMemoryReadComparison
●
Transactional throughput decreased in both environments
●
Throughput decreased at a faster rate on z/VM than on KVM
● L2 z/VM throughput far lower than L2 KVM throughput

Mean Response Time
Environment →
Configuration →
X86_64s390x
L2L1L0L2L1L0
5000
4000
3000
2000
1000
0
ResponseTime(ms)
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
5000
4000
3000
2000
1000
0
ResponseTime(ms)
●
Response times slowed with degree of virtualization
●
Response times on z/VM slowed with faster rate than on KVM
● L2 z/VM response time much slower than L2 KVM response time

Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
1.0
0.8
0.6
0.4
0.2
0.0
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
4000
3000
2000
1000
0
ResponseTime(ms)
●
L1 response times for z/VM, KVM within 1% of HW response times
●
z/VM performance degrades at a faster rate than KVM performance
● L2 KVM performance still beats L2 z/VM performance

●
Performance degrades with each level of
virtualization
● S390x, z/VM have better L0, L1 performance
● KVM has better L2 memory read performance
●
Relative performance change between L1 and
L2 KVM is much smaller than between L1 and
L2 z/VM

MySQL Database Comparison
Environment→
Configuration →
X86_64s390x
L2L1L0L2L1L0
1400
1200
1000
800
600
400
200
0
Configuration →
Environment →
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
1400
1200
1000
800
600
400
200
0
SysBenchMySQLDatabasePerformanceComparison
●
S390x and z/VM provide superior L0, L1 throughput
●
Precipitous drop on z/VM throughput between L1 and L2
● x86_64 throughput degrades, but at a much more practical rate

Mean Response Time
Environment →
Configuration →
X86_64s390x
L2L1L0L2L1L0
100
80
60
40
20
0
ResponseTime(ms)
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
100
80
60
40
20
0
ResponseTime(ms)
●
S390x and z/VM offer incredible L0, L1 response times
●
Between L1 and L2, z/VM response time degrades by over 2000%!
● x86_64 and KVM performance degrade at a far more consist ant pace

Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
1400
1200
1000
800
600
400
200
0
Configuration →
Environment→
L2L1L0
X86_64s390xX86_64s390xX86_64s390x
70
60
50
40
30
20
10
0
ResponseTime(ms)
Mean ResponseTime- No Overcommitment
●
L1, L2 throughput decreases from over-committed results
●
L1, L2 response times improve from over-committed results
● L2 KVM response time now very close to L1 KVM response time
●
Not sufficient to help S390x's performance problems

●
MySQL test performance degrades with
increasing degree of virtualization
● x86_64 and KVM had the most reasonable
rate of performance change
●
S390x and z/VM had vastly superior L0 and
L1 performance
● Performance degradation at L2 z/VM is
“jarring”, could be a show-stopper

●
Why does L2 z/VM performance collapse?
●
Three factors
– An I/O intensive workload
– Design of S390x interpretive-execution
– Nature of how z/VM virtualizes interpretive-
execution for nested hypervisors

Conclusion
●
z/VM outperformed KVM in a number of areas
– Largely due to architectural benefits
● KVM had more predictable performance
– Memory read, Memory write, MySQL
● KVM needs to improve how CPU and thread
scheduling scale with degree of virtualization
●
z/VM needs to address L2 performance
degradation of I/O-generating workloads

Future Work
●
This study is only a first step
– Not a predictor of scalable performance
● Test how performance scales with increasing
numbers of nested and non-nested guests
●
Analyze performance of disk and network I/O
● Perform a study using “real world” macro-
benchmark, such as DayTrader

Acknowledgments
●
IBM Research
– Ray Mansell
● IBM Systems & Technology Group
– Alan Altmark
– Michael Day
– Mark Lorenc
– Eberhard Pasch

Special Thanks
●
My IBM Managers who encouraged and
supported this work
– Hanif Dandia (z/VM Development Org.)
– Jennifer Hunt (z/Firmware Development Org.)
– Keri Liburdi (z/Firmware Development Org.)
– Rob Urfer (IBM Wave for z/VM)
●
Elizabeth Crew (BCC)
●
Sarah FitzGerald

Interpretive Execution
●
Used by z/VM to achieve hardware-levels of
performance in L1 guests
● Allows most privileged guest instructions to
be handled by hardware, not by hypervisor
●
Problem: cannot handle guest I/O
instructions
● “SIE Break” - context switch when hypervisor
leaves SIE to simulate guest I/O instruction

●
Further problem: interpretive execution is only
available to L1 guests
● In order to run a hypervisor as an L1 guest,
the L0 hypervisor must simulate interpretive
execution for it
– Which L1 will need in order to run its L2 Vms
●
This “virtual” interpretive execution adds
overhead to SIE breaks

●
The added overhead of L1 SIE breaks
(caused by L2 guest I/O operations) is the
cause of the poor L2 z/VM performance in the
MySQL test
●
It may also be a factor with the poor L2 z/VM
performance observed with memory reads
and writes

Presentation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Presentation

Similaire à Presentation (20)

Plus de Daniel FitzGerald

Plus de Daniel FitzGerald (7)

Presentation