SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
The Turtles Project:
 Design and Implementation of Nested Virtualization

  Muli Ben-Yehuda†    Michael D. Day‡ Zvi Dubitzky† Michael Factor†
               †
   Nadav Har’El    Abel Gordon† Anthony Liguori‡ Orit Wasserman†
                           Ben-Ami Yassour†

                                           † IBM   Research—Haifa
                                      ‡ IBM   Linux Technology Center

                                   Technion—Israel Institute of Technology




Ben-Yehuda et al. (IBM Research)      The Turtles Project: Nested Virtualization   dc9723 May, 2011   1/1
What is nested x86 virtualization?



     Running multiple unmodified
                                                                 Guest           Guest
     hypervisors                                                  OS              OS

     With their associated
     unmodified VM’s                                                      Guest                   Guest
                                                                       Hypervisor                 OS
     Simultaneously
     On the x86 architecture
     Which does not support                                                     Hypervisor
     nesting in hardware. . .
     . . . but does support a single
     level of virtualization                                                    Hardware




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization            dc9723 May, 2011   2/1
Why?
      Operating systems are already hypervisors (Windows 7 with XP
      mode, Linux/KVM)
      Security: attack via or defend against hypervisor-level rootkits
      such as Blue Pill
      To be able to run other hypervisors in clouds
      Co-design of x86 hardware and system software
      Testing, demonstrating, debugging, live migration of hypervisors




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   3/1
Related work


      First models for nested virtualization [PopekGoldberg74,
      BelpaireHsu75, LauerWyeth73]
      First implementation in the IBM z/VM; relies on architectural
      support for nested virtualization (sie)
      Microkernels meet recursive VMs [FordHibler96]: assumes we
      can modify software at all levels
      x86 software based approaches (slow!) [Berghmans10]
      KVM [KivityKamay07] with AMD SVM [RoedelGraf09]
      Early Xen prototype [He09]
      Blue Pill rootkit hiding from other hypervisors [Rutkowska06]




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   4/1
What is the Turtles project?




      Efficient nested virtualization for Intel x86 based on KVM
      Runs multiple guest hypervisors and VMs: KVM, VMware, Linux,
      Windows, . . .
      Code publicly available
Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   5/1
What is the Turtles project? (cont’)


      Nested VMX virtualization for nested CPU virtualization
      Multi-dimensional paging for nested MMU virtualization
      Multi-level device assignment for nested I/O virtualization
      Micro-optimizations to make it go fast




                        +                       +                               =




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization       dc9723 May, 2011   6/1
Theory of nested CPU virtualization
      Trap and emulate[PopekGoldberg74] ⇒ it’s all about the traps
      Single-level (x86) vs. multi-level (e.g., z/VM)
      Single level ⇒ one hypervisor, many guests
      Turtles approach: L0 multiplexes the hardware between L1 and L2 ,
      running both as guests of L0 —without either being aware of it
      (Scheme generalized for n levels; Our focus is n=2)


        Guest          Guest
         L2             L2



                   Guest                                     Guest
                 Hypervisor            Guest                 Hypervisor         Guest Guest      Guest
         L1
                                                             L1                 L2    L2



         L0                                                  L0
                  Host Hypervisor                                      Host Hypervisor

                       Hardware                                             Hardware


              Multiple logical levels                       Multiplexed on a single level

Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization              dc9723 May, 2011   7/1
Nested VMX virtualization: flow

     L0 runs L1 with VMCS0→1
     L1 prepares VMCS1→2 and
                                                                    L1                         L2
     executes vmlaunch                                                                                  Guest
                                                                                                         OS
     vmlaunch traps to L0                                                  Guest
                                                                         Hypervisor
     L0 merges VMCS’s:
                                                                                                                Memory   1-2 State
                                                                                                VMCS
     VMCS 0→1 merged with                                                                                       Tables

     VMCS 1→2 is VMCS0→2
     L0 launches L2                                   0-1 State                  Memory                         Memory   0-2 State
                                                                     VMCS                       VMCS
                                                                                 Tables                         Tables
     L2 causes a trap
     L0 handles trap itself or                                      L0
                                                                                      Host Hypervisor
     forwards it to L1
     ...                                                                                Hardware
     eventually, L0 resumes L2
     repeat
Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization                           dc9723 May, 2011             8/1
Exit multiplication makes angry turtle angry
      To handle a single L2 exit, L1 does many things: read and write
      the VMCS, disable interrupts, . . .
      Those operations can trap, leading to exit multiplication
      Exit multiplication: a single L2 exit can cause 40-50 L1 exits!
      Optimize: make a single exit fast and reduce frequency of exits


                                                                   L3




                                                        …          L2



                                     …            …                L1
                 …



                                                                   L0



 Single        Two                 Three Levels
 Level        Levels
Ben-Yehuda et al. (IBM Research)     The Turtles Project: Nested Virtualization   dc9723 May, 2011   9/1
Introduction to x86 MMU virtualization



      x86 does page table walks in hardware
      MMU has one currently active hardware page table
      Bare metal ⇒ only needs one logical translation,
      (virtual → physical)
      Virtualization ⇒ needs two logical translations
         1   Guest page table: (guest virt → guest phys)
         2   Host page table: (guest phys → host phys)
      . . . but MMU only knows to walk a single table!




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   10 / 1
Software MMU virtualization: shadow paging
                                   Guest virtual




                GV->GP




                                    Guest physical                               SPT



                GP->HP




                                   Host physical




      Two logical translations compressed onto the shadow page
      table [DevineBugnion02]
      Unmodified guest OS updates its own table
      Hypervisor traps OS page table updates
      Hypervisor propagates updates to the hardware table
      MMU walks the table
      Problem: traps are expensive
Ben-Yehuda et al. (IBM Research)    The Turtles Project: Nested Virtualization   dc9723 May, 2011   11 / 1
Hardware MMU virtualization: Extended Page Tables

                                   Guest virtual




                GV->GP                                                           GPT




                                    Guest physical



                GP->HP

                                                                                 EPT



                                   Host physical




      Two-dimensional paging: guest owns GPT, hypervisor owns
      EPT [BhargavaSerebrin08]
      Unmodified guest OS updates GPT
      Hypervisor updates EPT table controlling (guest phys → host
      phys) translations
      MMU walks both tables
Ben-Yehuda et al. (IBM Research)    The Turtles Project: Nested Virtualization   dc9723 May, 2011   12 / 1
Nested MMU virt. via multi-dimensional paging

      Three logical translations: L2 virt → phys, L2 → L1 , L1 → L0
      Only two tables in hardware with EPT:
      virt → phys and guest physical → host physical
      L0 compresses three logical translations onto two hardware tables

                  Shadow on top of              Shadow on top                      Multi-dimensional
                      shadow                       of EPT                                paging

                        L2 virtual                   L2 virtual                        L2 virtual

                  GPT                         GPT                                                      GPT

                        L2 physical                  L2 physical                       L2 physical
                                                                       SPT12
          SPT12
                                      SPT02                                    EPT12

                        L1 physical                  L1 physical                       L1 physical
                                                                                                       EPT02

                                                                       EPT01   EPT01

                        L0 physical                  L0 physical                       L0 physical




                        baseline                       better                           best




Ben-Yehuda et al. (IBM Research)      The Turtles Project: Nested Virtualization             dc9723 May, 2011   13 / 1
Baseline: shadow-on-shadow
                                   L2 virtual



                        GPT

                                   L2 physical



             SPT12                                                                     SPT02

                                   L1 physical




                                   L0 physical




      Assume no EPT table; all hypervisors use shadow paging
      Useful for old machines and as a baseline
      Maintaining shadow page tables is expensive
      Compress: three logical translations ⇒ one table in hardware
Ben-Yehuda et al. (IBM Research)      The Turtles Project: Nested Virtualization   dc9723 May, 2011   14 / 1
Better: shadow-on-EPT
                               L2 virtual



                  GPT

                               L2 physical
                                                                                  SPT12




                               L1 physical



                                                                                  EPT01

                               L0 physical




      Instead of one hardware table we have two
      Compress: three logical translations ⇒ two in hardware
      Simple approach: L0 uses EPT, L1 uses shadow paging for L2
      Every L2 page fault leads to multiple L1 exits
Ben-Yehuda et al. (IBM Research)     The Turtles Project: Nested Virtualization       dc9723 May, 2011   15 / 1
Best: multi-dimensional paging
                               L2 virtual



                                                                                  GPT

                               L2 physical



          EPT12

                               L1 physical
                                                                                  EPT02

          EPT01

                               L0 physical




      EPT table rarely changes; guest page table changes a lot
      Again, compress three logical translations ⇒ two in hardware
      L0 emulates EPT for L1
      L0 uses EPT0→1 and EPT1→2 to construct EPT0→2
      End result: a lot less exits!
Ben-Yehuda et al. (IBM Research)     The Turtles Project: Nested Virtualization         dc9723 May, 2011   16 / 1
Introduction to I/O virtualization
       From the hypervisor’s perspective, what is I/O?
       (1) PIO (2) MMIO (3) DMA (4) interrupts
       Device emulation [Sugerman01]
                                                          HOST

                                                         GUEST
                                                              1
                                                                           device
                                                                           driver

                                                                  2



                                                          device           device
                                                          driver          emulation
                                                 4                    3




       Para-virtualized drivers [Barham03, Russell08]
                                                          HOST

                                                         GUEST
                                                                          front−end
                                                                            virtual
                                                                            driver

                                                                            1


                                                                          back−end
                                                          device           virtual
                                                 3        driver      2    driver




       Direct device assignment [Levasseur04,Yassour08]
                                                          HOST

                                                         GUEST

                                                          device
                                                          driver




              Direct assignment best performing option
              Direct assignment requires IOMMU for safe DMA bypass
 Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization        dc9723 May, 2011   17 / 1
Multi-level device assignment
      With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0 )
      Multi-level device assignment means giving an L2 guest direct
      access to L0 ’s devices, safely bypassing both L0 and L1

                                              MMIOs and PIOs




                       L2 device          L1                      L0            physical
                        driver         hypervisor              hypervisor        device


                                        L1 IOMMU                L0 IOMMU

                                                Device DMA via
                                               platform IOMMU




      How? L0 emulates an IOMMU for L1 [Amit11]
      L0 compresses multiple IOMMU translations onto the single
      hardware IOMMU page table
      L2 programs the device directly
      Device DMA’s into L2 memory space directly
Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization       dc9723 May, 2011   18 / 1
Micro-optimizations



      Goal: reduce world switch overheads
      Reduce cost of single exit by focus on VMCS merges:
             Keep VMCS fields in processor encoding
             Partial updates instead of whole-sale copying
             Copy multiple fields at once
             Some optimizations not safe according to spec
      Reduce frequency of exits—focus on vmread and vmwrite
             Avoid the exit multiplier effect
             Loads/stores vs. architected trapping instructions
             Binary patching?




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   19 / 1
Nested VMX support in KVM

Date: Mon, 16 May 2011 22:43:54 +0300
From: Nadav Har’El <nyh (at) il.ibm.com>
To: kvm [at] vger.kernel.org
Cc: gleb [at] redhat.com, avi [at] redhat.com
Subject: [PATCH 0/31] nVMX: Nested VMX, v10

Hi,

This is the tenth iteration of the nested VMX patch set. Improvements in this
version over the previous one include:

 * Fix the code which did not fully maintain a list of all VMCSs loaded on
   each CPU. (Avi, this was the big thing that bothered you in the previous
   version).

 * Add nested-entry-time (L1->L2) verification of control fields of vmcs12 -
   procbased, pinbased, entry, exit and secondary controls - compared to the
   capability MSRs which we advertise to L1.

[many other changes trimmed]

This new set of patches applies to the current KVM trunk (I checked with
6f1bd0daae731ff07f4755b4f56730a6e4a3c1cb).
If you wish, you can also check out an already-patched version of KVM from
branch "nvmx10" of the repository:
         git://github.com/nyh/kvm-nested-vmx.git




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   20 / 1
Windows XP on KVM L1 on KVM L0




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   21 / 1
Linux on VMware L1 on KVM L0




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   22 / 1
Experimental Setup

     Running Linux, Windows,
     KVM, VMware, SMP, . . .
     Macro workloads:
            kernbench
            SPECjbb
            netperf
     Multi-dimensional paging?
     Multi-level device assignment?
     KVM as L1 vs. VMware as L1 ?


     See paper for full experimental
     details and more benchmarks
     and analysis

Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   23 / 1
Macro: SPECjbb and kernbench
                                             kernbench
                                             Host    Guest                 Nested   NestedDRW
      Run time                               324.3 355                     406.3    391.5
      % overhead vs. host                    -       9.5                   25.3     20.7
      % overhead vs. guest                   -       -                     14.5     10.3
                                               SPECjbb
                                             Host    Guest                 Nested   NestedDRW
      Score                                  90493 83599                   77065    78347
      % degradation vs. host                 -       7.6                   14.8     13.4
      % degradation vs. guest                -       -                     7.8      6.3
                          Table: kernbench and SPECjbb results


      Exit multiplication effect not as bad as we feared
      Direct vmread and vmwrite (DRW) give an immediate boost
      Take-away: each level of virtualization adds approximately the same
      overhead!
Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization       dc9723 May, 2011   24 / 1
Macro: multi-dimensional paging
                    3.5


                    3.0


                    2.5
                                                                                 Shadow on EPT
Improvement ratio




                                                                                 Multi−dimensional paging
                    2.0


                    1.5


                    1.0


                    0.5


                    0.0
                                    kernbench                specjbb                  netperf


                    Impact of multi-dimensional paging depends on rate of page faults
                    Shadow-on-EPT: every L2 page fault causes L1 multiple exits
                    Multi-dimensional paging: only EPT violations cause L1 exits
                    EPT table rarely changes: #(EPT violations) << #(page faults)
                    Multi-dimensional paging huge win for page-fault intensive
                    kernbench
     Ben-Yehuda et al. (IBM Research)    The Turtles Project: Nested Virtualization             dc9723 May, 2011   25 / 1
Macro: multi-level device assignment

                                                             throughput (Mbps)
                                                             %cpu
                    1,000                                                                                          100
                     900
                     800                                                                                           80
throughput (Mbps)




                     700
                     600                                                                                           60




                                                                                                                         % cpu
                     500
                     400                                                                                           40
                     300
                     200                                                                                           20
                     100
                       0     nat    si        sing      sing     nes      nes       nes      nes     nes            0
                                ive engle
                                       mu le virti e le direlc le emtu d guvirted guvirted gudirted gudirted gu
                                                   l         e       e
                                         lati vel o vel t a vel lati es tio / es tio / es ect / es ect / es
                                             on gue        gue cce gue on /t       em t
                                                                                       ul
                                                                                            virt t   virtt    diret
                                                     st        st ss st        em               io       io        ct
                                                                                  ula ation
                                                                                     tion




                    Benchmark: netperf TCP_STREAM (transmit)
                    Multi-level device assignment best performing option
                    But: native at 20%, multi-level device assignment at 60% (x3!)
                    Interrupts considered harmful, cause exit multiplication
     Ben-Yehuda et al. (IBM Research)        The Turtles Project: Nested Virtualization             dc9723 May, 2011     26 / 1
Macro: multi-level device assignment (sans interrupts)


                        1000
                        900
    Throughput (Mbps)




                        800
                        700
                        600
                        500
                        400
                        300                                                L0 (bare metal)
                        200                                               L2 (direct/direct)
                                                                           L2 (direct/virtio)
                        100
                               16   32           64          128                        256                512
                                             Message size (netperf -m)




                What if we could deliver device interrupts directly to L2 ?
                Only 7% difference between native and nested guest!
Ben-Yehuda et al. (IBM Research)    The Turtles Project: Nested Virtualization          dc9723 May, 2011    27 / 1
Micro: synthetic worst case CPUID loop

             60,000

             50,000

             40,000
CPU Cycles




                         L1
             30,000      L0
                         cpu mode switch
             20,000

             10,000

                 0            1. S           2. N         3. N         4. N            5. N
                              Gueingle            este    o es         opti este       o es
                                   st  Lev            d G ptimited G        miz d Gu ptimited Gu
                                          el             ues   zati ues         atio est      zati es
                                                            t      ons t            ns 3          ons t
                                                                      3.5.               .5.2        3.5.
                                                                           1                              1   &3
                                                                                                                 .5.2




              CPUID running in a tight loop is not a real-world workload!
              Went from 30x worse to “only” 6x worse
              A nested exit is still expensive—minimize both single exit
              cost and frequency of exits
    Ben-Yehuda et al. (IBM Research)       The Turtles Project: Nested Virtualization            dc9723 May, 2011       28 / 1
Conclusions

     Efficient nested x86
     virtualization is challenging but
     feasible
     A whole new ballpark opening
     up many exciting
     applications—security, cloud,
     architecture, . . .
     Current overhead of 6-14%
            Negligible for some
            workloads, not yet for others
            Work in progress—expect at
            most 5% eventually
     Code is available
     Why Turtles?
     It’s turtles all the way down

Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   29 / 1
Questions?




Ben-Yehuda et al. (IBM Research)   The Turtles Project: Nested Virtualization   dc9723 May, 2011   30 / 1

Contenu connexe

Tendances

How Quantum configures Virtual Networks under the Hood?
How Quantum configures Virtual Networks under the Hood?How Quantum configures Virtual Networks under the Hood?
How Quantum configures Virtual Networks under the Hood?
Etsuji Nakai
 
Erlang factory slides
Erlang factory slidesErlang factory slides
Erlang factory slides
Noah Linden
 

Tendances (20)

IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
 
T2: What the Second Generation Holds
T2: What the Second Generation HoldsT2: What the Second Generation Holds
T2: What the Second Generation Holds
 
How Quantum configures Virtual Networks under the Hood?
How Quantum configures Virtual Networks under the Hood?How Quantum configures Virtual Networks under the Hood?
How Quantum configures Virtual Networks under the Hood?
 
Nic teaming and converged fabric
Nic teaming and converged fabricNic teaming and converged fabric
Nic teaming and converged fabric
 
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
 
sVirt: Hardening Linux Virtualization with Mandatory Access Control
sVirt: Hardening Linux Virtualization with Mandatory Access ControlsVirt: Hardening Linux Virtualization with Mandatory Access Control
sVirt: Hardening Linux Virtualization with Mandatory Access Control
 
DTrace in the Non-global Zone
DTrace in the Non-global ZoneDTrace in the Non-global Zone
DTrace in the Non-global Zone
 
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServerUnder the Hood: Open vSwitch & OpenFlow in XCP & XenServer
Under the Hood: Open vSwitch & OpenFlow in XCP & XenServer
 
Windows Server 2012 Hyper-V Networking Evolved
Windows Server 2012 Hyper-V Networking Evolved Windows Server 2012 Hyper-V Networking Evolved
Windows Server 2012 Hyper-V Networking Evolved
 
Lightweight Virtualization in Linux
Lightweight Virtualization in LinuxLightweight Virtualization in Linux
Lightweight Virtualization in Linux
 
Extending ETSI VNF descriptors and OpenVIM to support Unikernels
Extending ETSI VNF descriptors and OpenVIM to support UnikernelsExtending ETSI VNF descriptors and OpenVIM to support Unikernels
Extending ETSI VNF descriptors and OpenVIM to support Unikernels
 
Flash memory
Flash memoryFlash memory
Flash memory
 
Juniper and VMware: Taking Data Centre Networks to the Next Level
Juniper and VMware: Taking Data Centre Networks to the Next LevelJuniper and VMware: Taking Data Centre Networks to the Next Level
Juniper and VMware: Taking Data Centre Networks to the Next Level
 
Building Clouds One 1.4
Building Clouds One 1.4Building Clouds One 1.4
Building Clouds One 1.4
 
Windows Azure Interoperability
Windows Azure InteroperabilityWindows Azure Interoperability
Windows Azure Interoperability
 
Superfluid NFV: VMs and Virtual Infrastructure Managers speed-up for instanta...
Superfluid NFV: VMs and Virtual Infrastructure Managers speed-up for instanta...Superfluid NFV: VMs and Virtual Infrastructure Managers speed-up for instanta...
Superfluid NFV: VMs and Virtual Infrastructure Managers speed-up for instanta...
 
Erlang factory slides
Erlang factory slidesErlang factory slides
Erlang factory slides
 
Embedded is not special
Embedded is not specialEmbedded is not special
Embedded is not special
 
Dealing with Hardware Heterogeneity Using EmbeddedXEN, a Virtualization Frame...
Dealing with Hardware Heterogeneity Using EmbeddedXEN, a Virtualization Frame...Dealing with Hardware Heterogeneity Using EmbeddedXEN, a Virtualization Frame...
Dealing with Hardware Heterogeneity Using EmbeddedXEN, a Virtualization Frame...
 
The kvm virtualization way
The kvm virtualization wayThe kvm virtualization way
The kvm virtualization way
 

Similaire à Turtles dc9723

Nova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsNova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute models
openstackindia
 
LinuxCon NA 2012: Virtualization in the cloud featuring xen
LinuxCon NA 2012: Virtualization in the cloud featuring xenLinuxCon NA 2012: Virtualization in the cloud featuring xen
LinuxCon NA 2012: Virtualization in the cloud featuring xen
The Linux Foundation
 
Ready for cloud computing with hyper v
Ready for cloud computing with hyper vReady for cloud computing with hyper v
Ready for cloud computing with hyper v
Andik Susilo
 
Aidan Finn Hyper V The Future Of Infrastructure
Aidan Finn   Hyper V   The Future Of InfrastructureAidan Finn   Hyper V   The Future Of Infrastructure
Aidan Finn Hyper V The Future Of Infrastructure
Nathan Winters
 
Xen Project Update LinuxCon Brazil
Xen Project Update LinuxCon BrazilXen Project Update LinuxCon Brazil
Xen Project Update LinuxCon Brazil
The Linux Foundation
 
Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Isn't it ironic - managing a bare metal cloud (OSL TES 2015)Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Devananda Van Der Veen
 

Similaire à Turtles dc9723 (20)

Nova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsNova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute models
 
open source virtualization
open source virtualizationopen source virtualization
open source virtualization
 
Linuxcon EU : Virtualization in the Cloud featuring Xen and XCP
Linuxcon EU : Virtualization in the Cloud featuring Xen and XCPLinuxcon EU : Virtualization in the Cloud featuring Xen and XCP
Linuxcon EU : Virtualization in the Cloud featuring Xen and XCP
 
Windows server 2012 failover clustering improvements
Windows server 2012   failover clustering improvementsWindows server 2012   failover clustering improvements
Windows server 2012 failover clustering improvements
 
RunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVMRunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVM
 
LinuxCon NA 2012: Virtualization in the cloud featuring xen
LinuxCon NA 2012: Virtualization in the cloud featuring xenLinuxCon NA 2012: Virtualization in the cloud featuring xen
LinuxCon NA 2012: Virtualization in the cloud featuring xen
 
Linux virtualization
Linux virtualizationLinux virtualization
Linux virtualization
 
Scale11x : Virtualization with Xen and XCP
Scale11x : Virtualization with Xen and XCP Scale11x : Virtualization with Xen and XCP
Scale11x : Virtualization with Xen and XCP
 
Virtual Container - Docker
Virtual Container - Docker Virtual Container - Docker
Virtual Container - Docker
 
Lightning talk unikernels
Lightning talk unikernelsLightning talk unikernels
Lightning talk unikernels
 
CloudStack Networking
CloudStack NetworkingCloudStack Networking
CloudStack Networking
 
Scale11x : Virtualization with Xen and XCP
Scale11x : Virtualization with Xen and XCPScale11x : Virtualization with Xen and XCP
Scale11x : Virtualization with Xen and XCP
 
Ready for cloud computing with hyper v
Ready for cloud computing with hyper vReady for cloud computing with hyper v
Ready for cloud computing with hyper v
 
Aidan Finn Hyper V The Future Of Infrastructure
Aidan Finn   Hyper V   The Future Of InfrastructureAidan Finn   Hyper V   The Future Of Infrastructure
Aidan Finn Hyper V The Future Of Infrastructure
 
Windows 8 Hyper-V: Availability
Windows 8 Hyper-V: AvailabilityWindows 8 Hyper-V: Availability
Windows 8 Hyper-V: Availability
 
Xen Project Update LinuxCon Brazil
Xen Project Update LinuxCon BrazilXen Project Update LinuxCon Brazil
Xen Project Update LinuxCon Brazil
 
Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco
Disco: Running Commodity Operating Systems on Scalable Multiprocessors DiscoDisco: Running Commodity Operating Systems on Scalable Multiprocessors Disco
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco
 
Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Isn't it ironic - managing a bare metal cloud (OSL TES 2015)Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
 

Plus de Iftach Ian Amit

"Cyber" security - all good, no need to worry?
"Cyber" security - all good, no need to worry?"Cyber" security - all good, no need to worry?
"Cyber" security - all good, no need to worry?
Iftach Ian Amit
 
Passwords good badugly181212-2
Passwords good badugly181212-2Passwords good badugly181212-2
Passwords good badugly181212-2
Iftach Ian Amit
 
Cheating in Computer Games
Cheating in Computer GamesCheating in Computer Games
Cheating in Computer Games
Iftach Ian Amit
 

Plus de Iftach Ian Amit (20)

Cyber Risk Quantification - CyberTLV
Cyber Risk Quantification - CyberTLVCyber Risk Quantification - CyberTLV
Cyber Risk Quantification - CyberTLV
 
Devsecops at Cimpress
Devsecops at CimpressDevsecops at Cimpress
Devsecops at Cimpress
 
BSidesTLV Closing Keynote
BSidesTLV Closing KeynoteBSidesTLV Closing Keynote
BSidesTLV Closing Keynote
 
Social Media Risk Metrics
Social Media Risk MetricsSocial Media Risk Metrics
Social Media Risk Metrics
 
ISTS12 Keynote
ISTS12 KeynoteISTS12 Keynote
ISTS12 Keynote
 
From your Pocket to your Heart and Back
From your Pocket to your Heart and BackFrom your Pocket to your Heart and Back
From your Pocket to your Heart and Back
 
Painting a Company Red and Blue
Painting a Company Red and BluePainting a Company Red and Blue
Painting a Company Red and Blue
 
"Cyber" security - all good, no need to worry?
"Cyber" security - all good, no need to worry?"Cyber" security - all good, no need to worry?
"Cyber" security - all good, no need to worry?
 
Armorizing applications
Armorizing applicationsArmorizing applications
Armorizing applications
 
Seeing Red In Your Future?
Seeing Red In Your Future?Seeing Red In Your Future?
Seeing Red In Your Future?
 
Hacking cyber-iamit
Hacking cyber-iamitHacking cyber-iamit
Hacking cyber-iamit
 
Passwords good badugly181212-2
Passwords good badugly181212-2Passwords good badugly181212-2
Passwords good badugly181212-2
 
Bitcoin
BitcoinBitcoin
Bitcoin
 
Sexy defense
Sexy defenseSexy defense
Sexy defense
 
Cyber state
Cyber stateCyber state
Cyber state
 
Advanced Data Exfiltration - the way Q would have done it
Advanced Data Exfiltration - the way Q would have done itAdvanced Data Exfiltration - the way Q would have done it
Advanced Data Exfiltration - the way Q would have done it
 
Infecting Python Bytecode
Infecting Python BytecodeInfecting Python Bytecode
Infecting Python Bytecode
 
Exploiting Second life
Exploiting Second lifeExploiting Second life
Exploiting Second life
 
Dtmf phreaking
Dtmf phreakingDtmf phreaking
Dtmf phreaking
 
Cheating in Computer Games
Cheating in Computer GamesCheating in Computer Games
Cheating in Computer Games
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Turtles dc9723

  • 1. The Turtles Project: Design and Implementation of Nested Virtualization Muli Ben-Yehuda† Michael D. Day‡ Zvi Dubitzky† Michael Factor† † Nadav Har’El Abel Gordon† Anthony Liguori‡ Orit Wasserman† Ben-Ami Yassour† † IBM Research—Haifa ‡ IBM Linux Technology Center Technion—Israel Institute of Technology Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 1/1
  • 2. What is nested x86 virtualization? Running multiple unmodified Guest Guest hypervisors OS OS With their associated unmodified VM’s Guest Guest Hypervisor OS Simultaneously On the x86 architecture Which does not support Hypervisor nesting in hardware. . . . . . but does support a single level of virtualization Hardware Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 2/1
  • 3. Why? Operating systems are already hypervisors (Windows 7 with XP mode, Linux/KVM) Security: attack via or defend against hypervisor-level rootkits such as Blue Pill To be able to run other hypervisors in clouds Co-design of x86 hardware and system software Testing, demonstrating, debugging, live migration of hypervisors Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 3/1
  • 4. Related work First models for nested virtualization [PopekGoldberg74, BelpaireHsu75, LauerWyeth73] First implementation in the IBM z/VM; relies on architectural support for nested virtualization (sie) Microkernels meet recursive VMs [FordHibler96]: assumes we can modify software at all levels x86 software based approaches (slow!) [Berghmans10] KVM [KivityKamay07] with AMD SVM [RoedelGraf09] Early Xen prototype [He09] Blue Pill rootkit hiding from other hypervisors [Rutkowska06] Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 4/1
  • 5. What is the Turtles project? Efficient nested virtualization for Intel x86 based on KVM Runs multiple guest hypervisors and VMs: KVM, VMware, Linux, Windows, . . . Code publicly available Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 5/1
  • 6. What is the Turtles project? (cont’) Nested VMX virtualization for nested CPU virtualization Multi-dimensional paging for nested MMU virtualization Multi-level device assignment for nested I/O virtualization Micro-optimizations to make it go fast + + = Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 6/1
  • 7. Theory of nested CPU virtualization Trap and emulate[PopekGoldberg74] ⇒ it’s all about the traps Single-level (x86) vs. multi-level (e.g., z/VM) Single level ⇒ one hypervisor, many guests Turtles approach: L0 multiplexes the hardware between L1 and L2 , running both as guests of L0 —without either being aware of it (Scheme generalized for n levels; Our focus is n=2) Guest Guest L2 L2 Guest Guest Hypervisor Guest Hypervisor Guest Guest Guest L1 L1 L2 L2 L0 L0 Host Hypervisor Host Hypervisor Hardware Hardware Multiple logical levels Multiplexed on a single level Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 7/1
  • 8. Nested VMX virtualization: flow L0 runs L1 with VMCS0→1 L1 prepares VMCS1→2 and L1 L2 executes vmlaunch Guest OS vmlaunch traps to L0 Guest Hypervisor L0 merges VMCS’s: Memory 1-2 State VMCS VMCS 0→1 merged with Tables VMCS 1→2 is VMCS0→2 L0 launches L2 0-1 State Memory Memory 0-2 State VMCS VMCS Tables Tables L2 causes a trap L0 handles trap itself or L0 Host Hypervisor forwards it to L1 ... Hardware eventually, L0 resumes L2 repeat Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 8/1
  • 9. Exit multiplication makes angry turtle angry To handle a single L2 exit, L1 does many things: read and write the VMCS, disable interrupts, . . . Those operations can trap, leading to exit multiplication Exit multiplication: a single L2 exit can cause 40-50 L1 exits! Optimize: make a single exit fast and reduce frequency of exits L3 … L2 … … L1 … L0 Single Two Three Levels Level Levels Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 9/1
  • 10. Introduction to x86 MMU virtualization x86 does page table walks in hardware MMU has one currently active hardware page table Bare metal ⇒ only needs one logical translation, (virtual → physical) Virtualization ⇒ needs two logical translations 1 Guest page table: (guest virt → guest phys) 2 Host page table: (guest phys → host phys) . . . but MMU only knows to walk a single table! Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 10 / 1
  • 11. Software MMU virtualization: shadow paging Guest virtual GV->GP Guest physical SPT GP->HP Host physical Two logical translations compressed onto the shadow page table [DevineBugnion02] Unmodified guest OS updates its own table Hypervisor traps OS page table updates Hypervisor propagates updates to the hardware table MMU walks the table Problem: traps are expensive Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 11 / 1
  • 12. Hardware MMU virtualization: Extended Page Tables Guest virtual GV->GP GPT Guest physical GP->HP EPT Host physical Two-dimensional paging: guest owns GPT, hypervisor owns EPT [BhargavaSerebrin08] Unmodified guest OS updates GPT Hypervisor updates EPT table controlling (guest phys → host phys) translations MMU walks both tables Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 12 / 1
  • 13. Nested MMU virt. via multi-dimensional paging Three logical translations: L2 virt → phys, L2 → L1 , L1 → L0 Only two tables in hardware with EPT: virt → phys and guest physical → host physical L0 compresses three logical translations onto two hardware tables Shadow on top of Shadow on top Multi-dimensional shadow of EPT paging L2 virtual L2 virtual L2 virtual GPT GPT GPT L2 physical L2 physical L2 physical SPT12 SPT12 SPT02 EPT12 L1 physical L1 physical L1 physical EPT02 EPT01 EPT01 L0 physical L0 physical L0 physical baseline better best Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 13 / 1
  • 14. Baseline: shadow-on-shadow L2 virtual GPT L2 physical SPT12 SPT02 L1 physical L0 physical Assume no EPT table; all hypervisors use shadow paging Useful for old machines and as a baseline Maintaining shadow page tables is expensive Compress: three logical translations ⇒ one table in hardware Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 14 / 1
  • 15. Better: shadow-on-EPT L2 virtual GPT L2 physical SPT12 L1 physical EPT01 L0 physical Instead of one hardware table we have two Compress: three logical translations ⇒ two in hardware Simple approach: L0 uses EPT, L1 uses shadow paging for L2 Every L2 page fault leads to multiple L1 exits Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 15 / 1
  • 16. Best: multi-dimensional paging L2 virtual GPT L2 physical EPT12 L1 physical EPT02 EPT01 L0 physical EPT table rarely changes; guest page table changes a lot Again, compress three logical translations ⇒ two in hardware L0 emulates EPT for L1 L0 uses EPT0→1 and EPT1→2 to construct EPT0→2 End result: a lot less exits! Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 16 / 1
  • 17. Introduction to I/O virtualization From the hypervisor’s perspective, what is I/O? (1) PIO (2) MMIO (3) DMA (4) interrupts Device emulation [Sugerman01] HOST GUEST 1 device driver 2 device device driver emulation 4 3 Para-virtualized drivers [Barham03, Russell08] HOST GUEST front−end virtual driver 1 back−end device virtual 3 driver 2 driver Direct device assignment [Levasseur04,Yassour08] HOST GUEST device driver Direct assignment best performing option Direct assignment requires IOMMU for safe DMA bypass Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 17 / 1
  • 18. Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0 ) Multi-level device assignment means giving an L2 guest direct access to L0 ’s devices, safely bypassing both L0 and L1 MMIOs and PIOs L2 device L1 L0 physical driver hypervisor hypervisor device L1 IOMMU L0 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit11] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directly Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 18 / 1
  • 19. Micro-optimizations Goal: reduce world switch overheads Reduce cost of single exit by focus on VMCS merges: Keep VMCS fields in processor encoding Partial updates instead of whole-sale copying Copy multiple fields at once Some optimizations not safe according to spec Reduce frequency of exits—focus on vmread and vmwrite Avoid the exit multiplier effect Loads/stores vs. architected trapping instructions Binary patching? Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 19 / 1
  • 20. Nested VMX support in KVM Date: Mon, 16 May 2011 22:43:54 +0300 From: Nadav Har’El <nyh (at) il.ibm.com> To: kvm [at] vger.kernel.org Cc: gleb [at] redhat.com, avi [at] redhat.com Subject: [PATCH 0/31] nVMX: Nested VMX, v10 Hi, This is the tenth iteration of the nested VMX patch set. Improvements in this version over the previous one include: * Fix the code which did not fully maintain a list of all VMCSs loaded on each CPU. (Avi, this was the big thing that bothered you in the previous version). * Add nested-entry-time (L1->L2) verification of control fields of vmcs12 - procbased, pinbased, entry, exit and secondary controls - compared to the capability MSRs which we advertise to L1. [many other changes trimmed] This new set of patches applies to the current KVM trunk (I checked with 6f1bd0daae731ff07f4755b4f56730a6e4a3c1cb). If you wish, you can also check out an already-patched version of KVM from branch "nvmx10" of the repository: git://github.com/nyh/kvm-nested-vmx.git Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 20 / 1
  • 21. Windows XP on KVM L1 on KVM L0 Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 21 / 1
  • 22. Linux on VMware L1 on KVM L0 Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 22 / 1
  • 23. Experimental Setup Running Linux, Windows, KVM, VMware, SMP, . . . Macro workloads: kernbench SPECjbb netperf Multi-dimensional paging? Multi-level device assignment? KVM as L1 vs. VMware as L1 ? See paper for full experimental details and more benchmarks and analysis Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 23 / 1
  • 24. Macro: SPECjbb and kernbench kernbench Host Guest Nested NestedDRW Run time 324.3 355 406.3 391.5 % overhead vs. host - 9.5 25.3 20.7 % overhead vs. guest - - 14.5 10.3 SPECjbb Host Guest Nested NestedDRW Score 90493 83599 77065 78347 % degradation vs. host - 7.6 14.8 13.4 % degradation vs. guest - - 7.8 6.3 Table: kernbench and SPECjbb results Exit multiplication effect not as bad as we feared Direct vmread and vmwrite (DRW) give an immediate boost Take-away: each level of virtualization adds approximately the same overhead! Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 24 / 1
  • 25. Macro: multi-dimensional paging 3.5 3.0 2.5 Shadow on EPT Improvement ratio Multi−dimensional paging 2.0 1.5 1.0 0.5 0.0 kernbench specjbb netperf Impact of multi-dimensional paging depends on rate of page faults Shadow-on-EPT: every L2 page fault causes L1 multiple exits Multi-dimensional paging: only EPT violations cause L1 exits EPT table rarely changes: #(EPT violations) << #(page faults) Multi-dimensional paging huge win for page-fault intensive kernbench Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 25 / 1
  • 26. Macro: multi-level device assignment throughput (Mbps) %cpu 1,000 100 900 800 80 throughput (Mbps) 700 600 60 % cpu 500 400 40 300 200 20 100 0 nat si sing sing nes nes nes nes nes 0 ive engle mu le virti e le direlc le emtu d guvirted guvirted gudirted gudirted gu l e e lati vel o vel t a vel lati es tio / es tio / es ect / es ect / es on gue gue cce gue on /t em t ul virt t virtt diret st st ss st em io io ct ula ation tion Benchmark: netperf TCP_STREAM (transmit) Multi-level device assignment best performing option But: native at 20%, multi-level device assignment at 60% (x3!) Interrupts considered harmful, cause exit multiplication Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 26 / 1
  • 27. Macro: multi-level device assignment (sans interrupts) 1000 900 Throughput (Mbps) 800 700 600 500 400 300 L0 (bare metal) 200 L2 (direct/direct) L2 (direct/virtio) 100 16 32 64 128 256 512 Message size (netperf -m) What if we could deliver device interrupts directly to L2 ? Only 7% difference between native and nested guest! Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 27 / 1
  • 28. Micro: synthetic worst case CPUID loop 60,000 50,000 40,000 CPU Cycles L1 30,000 L0 cpu mode switch 20,000 10,000 0 1. S 2. N 3. N 4. N 5. N Gueingle este o es opti este o es st Lev d G ptimited G miz d Gu ptimited Gu el ues zati ues atio est zati es t ons t ns 3 ons t 3.5. .5.2 3.5. 1 1 &3 .5.2 CPUID running in a tight loop is not a real-world workload! Went from 30x worse to “only” 6x worse A nested exit is still expensive—minimize both single exit cost and frequency of exits Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 28 / 1
  • 29. Conclusions Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications—security, cloud, architecture, . . . Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress—expect at most 5% eventually Code is available Why Turtles? It’s turtles all the way down Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 29 / 1
  • 30. Questions? Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 30 / 1