SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Xen Guest NUMA:
General Enabling Part

                    29 April 2010
                    Jun Nakajima,
     Dexuan Cui, and Nitin Kamble
Legal Disclaimer
 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE,
  EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED
  BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH
  PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED
  WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES
  RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
  PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR
  USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.
 Intel may make changes to specifications and product descriptions at any time, without notice.
 All products, dates, and figures specified are preliminary based on current expectations, and are subject to
  change without notice.
 Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which
  may cause the product to deviate from published specifications. Current characterized errata are available
  on request.
 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in
  the United States and other countries.
 *Other names and brands may be claimed as the property of others.
 Copyright © 2010 Intel Corporation.




    Xen Summit NA 2010
                                                       2
Xen Guest NUMA Project

•   Working with Xen Community:
    − Andre Przywara andre.przywara@amd.com
    − Dulloor Rao dulloor@gmail.com
    − You are welcome to join us
•   Generic guest NUMA support both for PV and HVM
    − Major difference is basically ACPI tables
    − NUMA-specific enlightenments are applicable to both




                                 Xen Summit NA 2010
                                                            3
Agenda

•   NUMA machines
•   Importance of NUMA Awareness
•   Motivation of NUMA Guests
•   What is required to support effective NUMA guest?
•   Getting host info and resource allocation
•   Guest configuration
•   Current Status and Next Steps




                            Xen Summit NA 2010
                                                        4
NUMA Machines
                            I/O Hub                Node*
       Cores
                                                           *: A socket/package can
                                                            contain multiple nodes



               Xeon® 7500             Xeon® 7500




               Xeon® 7500             Xeon® 7500




                                                            Memory
                            I/O Hub

                                                           Memory Buffer


                       Xen Summit NA 2010
                                                                                5
NUMA Machines (cont.)

           2-socket                      2+2+2+2 (8S)                      4S (64DIMMs)                         4+4 (8S)




            2+2 (4S)

                                                                          4S (32DIMMs)




            CPU Socket                                                 I/O Hub

            Interconnect                                               Memory




6
       * Other names and brands be claimed as the property of others. Copyright Copyright © 2010, Intel
       *Other names and brands maymay be claimed as the property of others. © 2010, Intel Corporation. Corporation.   Intel Confidential
Importance of NUMA Awareness
                                           Andre Przywara <andre.przywara@amd.com>
 lmbench's rd benchmark (normalized to native Linux (=100)):
 guests      numa=off              numa=on        avg increase
           min  avg    max   min avg max
       1        78.0             102.3
       7   37.4 45.6   62.0 90.6 102.3 110.9        124.4%
      15   21.0 25.8   31.7 41.7 48.7 54.1          88.2%
      23   13.4 17.5   23.2 25.0 28.0 30.1          60.2%

 kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time:
 guests      numa=off        numa=on     increase
       1      480.610    464.320    3.4%
       7      482.109    461.721    4.2%
      15      515.297    477.669    7.3%
      23      548.427    495.180    9.7%
 again with   2 VCPUs and make -j2:
       1      264.580    261.690    1.1%
       7      279.763    258.907    7.7%
                                             *: 4 socket AMD Magny-Cours machine with 8 nodes,
      15      330.385    272.762   17.4%
      23      463.510    390.547   15.7% (46 VCPUs on 32pCPUs)        48 cores and 96 GB RAM.

http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00000.html

                                        Xen Summit NA 2010
                                                                                            7
Motivation
•   More NUMA machines in the market
•   Run very large guests efficiently on NUMA machines for
    performance reasons
    − More memory, VCPUs, I/O spanning across multiple nodes
    − More performance, throughput
•   Allow existing OS and apps to run in virtualization with NUMA
    enabled (or disabled)
    − Populate guest ACPI SRAT (Static Resource Affinity Table) and SLIT
      (System Locality Information Table)
    − NUMA libraries
•   NUMA-specific optimizations/enlightenments




                                Xen Summit NA 2010
                                                                           8
Achieving NUMA Performance
•   Which processors (i.e. cores) are connected directly to which
    blocks of memory?
    − SRAT (Static Resource Affinity Table) or PV
•   How far apart the processors are from their associated
    memory banks?
    − SLIT (System Locality Information Table) or PV
•   Virtualization Specific Requirements
    − Bind VCPUs to node
    − Construct guest SRAT and SLIT
                                                         Xeon® 7500   Xeon® 7500
        •   Need to reflect hardware attributes

•   Predictable and repeatable
    − Use fixed guest configuration                      Xeon® 7500   Xeon® 7500




                                    Xen Summit NA 2010
                                                                                   9
Constructing SRAT and SLIT for Guests

•   Get platform info from host using host NUMA API (in
    upstream)
    − XEN_SYSCTL_topologyinfo
        •   # of cores per node/socket
    − XEN_SYSCTL_numainfo
        •   Equivalent to SRAT and SLIT

•   Allocate memory from nodes based on memory allocation
    strategy in config file
    − CONFINE, SPLIT, STRIP (next page)
    − # of nodes
                                                         Xeon® 7500   Xeon® 7500




                                                         Xeon® 7500   Xeon® 7500




                                    Xen Summit NA 2010
                                                                                   10
Guest NUMA Config Options
•   Number of nodes means “# of nodes from which memory is
    allocated”
    − Not necessarily visible to guest
• max_guest_nodes=<N>
    − Specify desirable number of nodes. Number of system nodes by default.
• min_guest_nodes=<N>
    − Specify minimum number of nodes. Memory is allocated from nodes ( >=
      min_guest_nodes). Creation of guest fails if allocation does not meet it.
      1 by default.
•   Number of nodes matter for SPLIT and STRIP (next page)
•   Create guest in deterministic way by setting
    min_guest_nodes = max_guest_nodes


                                 Xen Summit NA 2010
                                                                                  11
Guest NUMA Config Options (cont.)
Memory Allocation Strategy:
•   CONFINE : Allocate entire domain memory from single node.
    Fail if does not work.
    − No need to tell guest NUMA at all.
•   SPLIT : Allocate domain memory from nodes by splitting
    equally across the nodes. Fail if does not work.
    − Populate NUMA topology, and propagate to guest (includes PV querying
      via hypercall). If guest is paravirtualized and does not know about NUMA
      (missing ELF hint), fail.
•   STRIPE : Interleave domain memory across nodes.
    − No need to tell guest about NUMA at all.
•   AUTOMATIC: Try three strategies after each other (order:
    CONFINE, SPLIT, STRIP)

                                Xen Summit NA 2010
                                                                                 12
Considerations on Live Migration
•   Number of nodes needs to be same
•   Memory allocation strategy needs to be inherited for live
    migration
    − CONFINE and STRIPE are not really NUMA guest
    − SPLIT: SPLIT will be used at live-migration time.
        •   If target machine has similar NUMA characteristics, it’s possible to do live
            migration retaining NUMA performance.




                                      Xen Summit NA 2010
                                                                                           13
Current Status and Next Steps
•   Current Status
    − Host NUMA API is in upstream
    − Rebasing the patches to submit
    − Re-measuring performance
    − Merge patches from Dulloor and Andre
•   Next Steps
    − Performance analysis and different workloads
       •   Scheduling
    − I/O NUMA
       •   DMA across nodes with direct device assignment
    − Live Migration
       •   Anyone?




                                  Xen Summit NA 2010
                                                            14

Contenu connexe

Tendances

Tendances (20)

XS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO EmulationXS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO Emulation
 
Nakajima hvm-be final
Nakajima hvm-be finalNakajima hvm-be final
Nakajima hvm-be final
 
Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008
 
Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008
 
XS Boston 2008 OVF
XS Boston 2008 OVFXS Boston 2008 OVF
XS Boston 2008 OVF
 
XS Japan 2008 Services English
XS Japan 2008 Services EnglishXS Japan 2008 Services English
XS Japan 2008 Services English
 
XS Oracle 2009 PVOps
XS Oracle 2009 PVOpsXS Oracle 2009 PVOps
XS Oracle 2009 PVOps
 
XS 2008 Boston VTPM
XS 2008 Boston VTPMXS 2008 Boston VTPM
XS 2008 Boston VTPM
 
XS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt EnglishXS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt English
 
XS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolarisXS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolaris
 
Xen.org Overview 2009
Xen.org Overview 2009Xen.org Overview 2009
Xen.org Overview 2009
 
Hyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisHyper V And Scvmm Best Practis
Hyper V And Scvmm Best Practis
 
XS Japan 2008 Project Status English
XS Japan 2008 Project Status EnglishXS Japan 2008 Project Status English
XS Japan 2008 Project Status English
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
Keynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM VirtualizationKeynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM Virtualization
 
XS Boston 2008 Malware & Training
XS Boston 2008 Malware & TrainingXS Boston 2008 Malware & Training
XS Boston 2008 Malware & Training
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
 
Realtime scheduling for virtual machines in SKT
Realtime scheduling for virtual machines in SKTRealtime scheduling for virtual machines in SKT
Realtime scheduling for virtual machines in SKT
 
I/O Scalability in Xen
I/O Scalability in XenI/O Scalability in Xen
I/O Scalability in Xen
 
Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008
 

Similaire à Nakajima numa-final

Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Tommy Lee
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Fisnik Kraja
 
OSS-10mins-7th2.pptx
OSS-10mins-7th2.pptxOSS-10mins-7th2.pptx
OSS-10mins-7th2.pptx
jagmohan33
 
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
aaajjj4
 
SLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA SystemsSLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA Systems
Davidlohr Bueso
 
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
Edge AI and Vision Alliance
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable Cloud
Chris Genazzio
 

Similaire à Nakajima numa-final (20)

Dulloor xen-summit
Dulloor xen-summitDulloor xen-summit
Dulloor xen-summit
 
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
NUMA overview
NUMA overviewNUMA overview
NUMA overview
 
Overview on NUMA
Overview on NUMAOverview on NUMA
Overview on NUMA
 
OSS-10mins-7th2.pptx
OSS-10mins-7th2.pptxOSS-10mins-7th2.pptx
OSS-10mins-7th2.pptx
 
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
 
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
 
MySQL Cluster overview + development slides (2014)
MySQL Cluster overview + development slides (2014) MySQL Cluster overview + development slides (2014)
MySQL Cluster overview + development slides (2014)
 
Azinta Gpu Cloud Services London Financial Python Ug 1.2
Azinta Gpu Cloud Services   London Financial Python Ug 1.2Azinta Gpu Cloud Services   London Financial Python Ug 1.2
Azinta Gpu Cloud Services London Financial Python Ug 1.2
 
Windows 8 Hyper-V: Scalability
Windows 8 Hyper-V: ScalabilityWindows 8 Hyper-V: Scalability
Windows 8 Hyper-V: Scalability
 
SLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA SystemsSLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA Systems
 
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable Cloud
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 

Plus de The Linux Foundation

Plus de The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Nakajima numa-final

  • 1. Xen Guest NUMA: General Enabling Part 29 April 2010 Jun Nakajima, Dexuan Cui, and Nitin Kamble
  • 2. Legal Disclaimer  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.  *Other names and brands may be claimed as the property of others.  Copyright © 2010 Intel Corporation. Xen Summit NA 2010 2
  • 3. Xen Guest NUMA Project • Working with Xen Community: − Andre Przywara andre.przywara@amd.com − Dulloor Rao dulloor@gmail.com − You are welcome to join us • Generic guest NUMA support both for PV and HVM − Major difference is basically ACPI tables − NUMA-specific enlightenments are applicable to both Xen Summit NA 2010 3
  • 4. Agenda • NUMA machines • Importance of NUMA Awareness • Motivation of NUMA Guests • What is required to support effective NUMA guest? • Getting host info and resource allocation • Guest configuration • Current Status and Next Steps Xen Summit NA 2010 4
  • 5. NUMA Machines I/O Hub Node* Cores *: A socket/package can contain multiple nodes Xeon® 7500 Xeon® 7500 Xeon® 7500 Xeon® 7500 Memory I/O Hub Memory Buffer Xen Summit NA 2010 5
  • 6. NUMA Machines (cont.) 2-socket 2+2+2+2 (8S) 4S (64DIMMs) 4+4 (8S) 2+2 (4S) 4S (32DIMMs) CPU Socket I/O Hub Interconnect Memory 6 * Other names and brands be claimed as the property of others. Copyright Copyright © 2010, Intel *Other names and brands maymay be claimed as the property of others. © 2010, Intel Corporation. Corporation. Intel Confidential
  • 7. Importance of NUMA Awareness Andre Przywara <andre.przywara@amd.com> lmbench's rd benchmark (normalized to native Linux (=100)): guests numa=off numa=on avg increase min avg max min avg max 1 78.0 102.3 7 37.4 45.6 62.0 90.6 102.3 110.9 124.4% 15 21.0 25.8 31.7 41.7 48.7 54.1 88.2% 23 13.4 17.5 23.2 25.0 28.0 30.1 60.2% kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time: guests numa=off numa=on increase 1 480.610 464.320 3.4% 7 482.109 461.721 4.2% 15 515.297 477.669 7.3% 23 548.427 495.180 9.7% again with 2 VCPUs and make -j2: 1 264.580 261.690 1.1% 7 279.763 258.907 7.7% *: 4 socket AMD Magny-Cours machine with 8 nodes, 15 330.385 272.762 17.4% 23 463.510 390.547 15.7% (46 VCPUs on 32pCPUs) 48 cores and 96 GB RAM. http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00000.html Xen Summit NA 2010 7
  • 8. Motivation • More NUMA machines in the market • Run very large guests efficiently on NUMA machines for performance reasons − More memory, VCPUs, I/O spanning across multiple nodes − More performance, throughput • Allow existing OS and apps to run in virtualization with NUMA enabled (or disabled) − Populate guest ACPI SRAT (Static Resource Affinity Table) and SLIT (System Locality Information Table) − NUMA libraries • NUMA-specific optimizations/enlightenments Xen Summit NA 2010 8
  • 9. Achieving NUMA Performance • Which processors (i.e. cores) are connected directly to which blocks of memory? − SRAT (Static Resource Affinity Table) or PV • How far apart the processors are from their associated memory banks? − SLIT (System Locality Information Table) or PV • Virtualization Specific Requirements − Bind VCPUs to node − Construct guest SRAT and SLIT Xeon® 7500 Xeon® 7500 • Need to reflect hardware attributes • Predictable and repeatable − Use fixed guest configuration Xeon® 7500 Xeon® 7500 Xen Summit NA 2010 9
  • 10. Constructing SRAT and SLIT for Guests • Get platform info from host using host NUMA API (in upstream) − XEN_SYSCTL_topologyinfo • # of cores per node/socket − XEN_SYSCTL_numainfo • Equivalent to SRAT and SLIT • Allocate memory from nodes based on memory allocation strategy in config file − CONFINE, SPLIT, STRIP (next page) − # of nodes Xeon® 7500 Xeon® 7500 Xeon® 7500 Xeon® 7500 Xen Summit NA 2010 10
  • 11. Guest NUMA Config Options • Number of nodes means “# of nodes from which memory is allocated” − Not necessarily visible to guest • max_guest_nodes=<N> − Specify desirable number of nodes. Number of system nodes by default. • min_guest_nodes=<N> − Specify minimum number of nodes. Memory is allocated from nodes ( >= min_guest_nodes). Creation of guest fails if allocation does not meet it. 1 by default. • Number of nodes matter for SPLIT and STRIP (next page) • Create guest in deterministic way by setting min_guest_nodes = max_guest_nodes Xen Summit NA 2010 11
  • 12. Guest NUMA Config Options (cont.) Memory Allocation Strategy: • CONFINE : Allocate entire domain memory from single node. Fail if does not work. − No need to tell guest NUMA at all. • SPLIT : Allocate domain memory from nodes by splitting equally across the nodes. Fail if does not work. − Populate NUMA topology, and propagate to guest (includes PV querying via hypercall). If guest is paravirtualized and does not know about NUMA (missing ELF hint), fail. • STRIPE : Interleave domain memory across nodes. − No need to tell guest about NUMA at all. • AUTOMATIC: Try three strategies after each other (order: CONFINE, SPLIT, STRIP) Xen Summit NA 2010 12
  • 13. Considerations on Live Migration • Number of nodes needs to be same • Memory allocation strategy needs to be inherited for live migration − CONFINE and STRIPE are not really NUMA guest − SPLIT: SPLIT will be used at live-migration time. • If target machine has similar NUMA characteristics, it’s possible to do live migration retaining NUMA performance. Xen Summit NA 2010 13
  • 14. Current Status and Next Steps • Current Status − Host NUMA API is in upstream − Rebasing the patches to submit − Re-measuring performance − Merge patches from Dulloor and Andre • Next Steps − Performance analysis and different workloads • Scheduling − I/O NUMA • DMA across nodes with direct device assignment − Live Migration • Anyone? Xen Summit NA 2010 14