SlideShare une entreprise Scribd logo
1  sur  34
Process Management
Interface – Exascale
Agenda
• Since we last met…
 Address some common questions
 Outline PMIx standards process
• PMIx v1.x release series
 What has been included and planned
 Review launch performance status
• Roadmap
 Features in the pipeline
 Potential future features
• Questions/comments/discussion
Charter?
• Define
 set of agnostic APIs (not affiliated with specific model code base) to
support application  system mgmt software (SMS) interactions
• Develop
 an open source (non-copy-left licensed) standalone “convenience” library
to facilitate adoption
• Retain
 transparent compatibility across all PMI/PMIx versions
• Support
 the Instant On initiative
• Work
 to define/implement new APIs for evolving programming models.
What Is PMIx?
• Standardized APIs
 Four header files (client, server, common, tool)
 Enable portability across environments
 Support interactions between applications and
system management stack
• Convenience library
 Facilitate adoption
 Serves as validation platform for standard
• Community
What Is PMIx?
• Standardized APIs
 Four header files (client, server, common, tool)
 Enable portability across environments
 Support interactions between applications and
system management stack
• Convenience library
 Facilitate adoption
 Serves as validation platform for standard
• Community
Required: Caveat
• Containerized operations
 Require cross-boundary compatibility
 Wireup library of containerized app must be
compatible with the local resource manager
• Issues occur when migrating
 Build under one environment using custom
implementation
 Move to another environment using different
implementation
• Convenience library mitigates the problem
Why Not Part of MPI Forum?
• PMIx is agnostic
 No concept of “communicator”
 No understanding of MPI
 Used by non-MPI libraries
• Discussions underway
 Bring it into Forum process in some
appropriate fashion
PMIx “Standards” Process
• Modifications/additions
 Proposed as RFC
 Include prototype implementation
• Pull request to convenience library
 Notification sent to mailing list
• Reviews conducted
 RFC and implementation
 Continues until consensus emerges
• Approval given
 Developer telecon (2x/week)
PMIx Numbering
Major . Minor . Release
Version of Standard
Track
convenience
library revisions
Regression Testing?
• Limited direct capability
 Run basic API tests on each PR
• Extensive embedded testing
 Open MPI includes PMIx master, regularly
updated
 20k+ tests run every night
• Tests all spawn, wireup, publish/lookup,
connect/disconnect APIs
• Not 100% code coverage
Adoption?
• Already released
 SLURM 16.05 (PMIx v1.1.5)
• Planned
 IBM, Fujitsu, Adaptive Solutions, Altair, Microsoft
• Reference server
 Provides surrogate support until native support
becomes available
 Supports full PMIx standard, limited by RM capabilities
 Launches network of PMIx servers across allocation
Agenda
• Since we last met…
 Address some common questions
 Outline PMIx standards process
• PMIx v1.x release series (Artem Polyakov, Mellanox)
 What has been included and planned
 Review launch performance status
• Roadmap
 Features in the pipeline
 Potential future features
• Questions/comments/discussion
0
0.1
0.2
0.3
0.4
0.5
0.6
0 5 10 15 20 25 30 35
MPI_Init (sec)
nodes
key exchange type:
collective direct-fetch direct-fetch/async
PMIx/UCX job-start use case
Hardware:
• 32 nodes
• 2 processors
(Xeon E5-2680 v2)
• 20 cores per node
• 1 proc per core
• Open MPI v2.1 (modified to enable ability
to avoid the barrier at the end of MPI_Init)
• PMIx v1.1.5
• UCX (f3f9ad7)
*
* direct-fetch/async assumes no synchronization barrier inside MPI_Init.
0
0.1
0.2
0.3
0.4
0.5
0.6
0 5 10 15 20 25 30 35
MPI_Init (sec)
nodes
key exchange type:
collective direct-fetch direct-fetch/async
PMIx/UCX job-start usecase
“allgatherv” on all
submitted keys
Synchronization
overhead
v1.2.0
• Extension of v1.1.5
 v1.1.5
• Each proc stores own copy of data
 v1.2
• Data stored in shared memory owned by PMIx
server
• Each proc has read-only access
• Benefits
 Minimizes memory footprint
 Faster launch times
Shared memory data storage
(architecture)
RTE
daemon
MPI
process
PMIx-client
PMIx-server
MPI
process
PMIx-client
PMIx
Shared mem
(store blobs)
usock
usock
• Server provides all the data through the shared memory
• Each process can fetch all the data with 0 server-side CPU cycles!
• In the case of direct key fetching if a key is not found in the shared
memory – a process will request it from the server using regular
messaging mechanism.
Shared memory data storage
(architecture)
RTE
daemon
MPI
process
PMIx-client
PMIx-server
MPI
process
PMIx-client
PMIx
Shared mem
(store blobs)
usock
usock
• Server provides all the data through the shared memory
• Each process can fetch all the data with 0 server-side CPU cycles!
• In the case of direct key fetching if a key is not found in the shared
memory – a process will request it from the server using regular
messaging mechanism.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 100 200 300 400 500 600 700
PMIx_Get(all), sec
processes
Messages
Shmem
Shared memory data storage
(synthetic performance test)
Hardware:
• 32 nodes
• 2 processors
(Intel Xeon E5-2680 v2)
• 20 cores per node
• 1 proc per core
• Open MPI v2.1
• PMIx v1.2
https://github.com/pmix/master/tree/master/contrib/perf_tools
• 10 keys per process
• 100-element arrays of
integers
Shared memory data storage
(synthetic performance test) [2]
Nodes procs
Messages (us) Shmem (us)
local key remote key local key remote key
1 20 8.8 6.4
2 40 8.9 9.2 6.5 6.5
4 80 8.8 9.2 6.5 6.5
8 160 8.7 9.2 6.4 6.4
16 320 8.4 9.2 6.5 6.5
32 640 8.2 9.2 6.5 6.5
Advantages:
• Stable timings for a separate key access(no difference
between local and remote key access)
• Up to 30% improvement for the remote key fetch
• Significant CPU offload on the SMP systems with large
core count.
Shared memory data storage
(synthetic performance test) [3]
0.20
0.07 0.07
0.00
0.05
0.10
0.15
0.20
0.25
PMI2 PMIx msg PMIx shmem
ms
CL1: avg(Put)
0.22
0.08 0.07
0.00
0.05
0.10
0.15
0.20
0.25
PMI2 PMIx msg PMIx shmem
ms
CL1: avg(Get)
CL1 Hardware:
• 15 nodes
• 2 processors
(Intel Xeon X5570)
• 8 cores per node
• 1 proc per core
CL2 Hardware:
• 64 nodes
• 2 processors
(Intel Xeon E5-2697 v3)
• 28 cores per node
• 1 proc per core
0.37
0.06 0.06
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
PMI2 PMIx msg PMIx shmem
ms CL2: avg(Put)
0.32
0.11
0.06
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
PMI2 PMIx msg PMIx shmem
ms
CL2: avg(Get)
PMIx Roadmap
2014
1/2016
1.1.3
11/2016
1.2.0
6/2016
1.1.4
RM Production Releases
Bug fixes
8/2016
1.1.5
Bug fixes Shared
memory
datastore
In Pipeline
David Solt
IBM
Tool Support
• Tool connection support
 Allow tools to connect to local PMIx server
 Specify system vs application
System
PMIx server
Mpirun/
orted
Tool
RM
P
Tool Support
• Query
 Network topology
• Array of proc network-relative locations
• Overall topology (e.g., “dragonfly”)
 Running jobs
• Currently executing job namespaces
• Array of proc location, status, PID
 Resources
• Available system resources
• Array of proc location, resource utilization (ala “top”)
 Queue status
• Current scheduler queue backlog
Examples
Debuggers?
New Flexibility
• Plugin architecture
 DLL-based system
 Supports proprietary binary components
 Allows multiple implementations of common
functionality
• Buffer pack/unpack operations
• Communications (TCP, shared memory,…)
• Security
Obsolescence Protection
• Plugin architecture
• Cross-version support
 Automatic detection of client/server version
 Properly adjust for changes in structures, protocols
 Ensure clients always get what they can understand
 Backward support to the v1.1.5 level
Notification
• Plugin architecture
• Cross-version support
• Event notification
 System generated, app generated
 Resolves issues in original API, implementation
 Register for broad range of events
• Constrained by availability of backend support
Logging
• Plugin architecture
• Cross-version support
• Event notification
• Log data
 Store desired data in system data store(s)
• Specify hot/warm/cold, local/remote, database and type of
database, …
 Log output to stdout/err
 Supports binary and non-binary data
• Heterogeneity taken care of for you
PMIx Roadmap
2014
1/2016
1.1.3
11/2016
1.2.0
6/2016
1.1.4
RM Production Releases
Bug fixes
8/2016
1.1.5
Bug fixes Shared
memory
datastore
Future
Ralph Castain
Intel
Future Features
Reference Server
• Initial version: DVM
 Interconnected PMIx servers
 High-speed, resilient collectives
• bcast, allgather/barrier
• Future updates: ”fill” mode
 Servers proxy clients to host RM
 Complete missing host functionality
Winter 2017
Future Features
Debugger Support
• Ongoing discussions with MPI Forum Tools WG
 Implement proposed MPIR2 interface
 Enhance scalability
• Exploit tool connection
 Obtain proctable info
 Use PMIx_Spawn to launch daemons, auto-wireup, localize
proctable retrieval
• Extend available supporting info
 Network topology, bandwidth utilization
 Event notification
Winter 2017
Future Features
Network Support Framework
• Interface to 3rd party libraries
• Enable support for network features
 Precondition of network security keys
 Retrieval of endpoint assignments, topology
• Data made available
 In initial job info returned at proc start
 Retrieved by Query
Spring 2017
Future Features
IO Support
• Reduce launch time
 Current practices
• Reactive cache/forward
• Static builds
 Proactive pre-positioning
• Examine provided job/script
• Return array of binaries and libraries required for execution
• Enhance execution
 Request async file positioning
• Callback when ready
 Specify persistence options
Summer 2017
Future Features
Generalized Data Store (GDS)
• Abstracted view of data store
 Multiple plugins for different implementations
• Local (hot) storage
• Distributed (warm) models
• Database (cold) storage
• Explore alternative paradigms
 Job info, wireup data
 Publish/lookup
 Log
Fall 2017
Open Discussion
We now have an interface library the RMs
will support for application-directed requests
Need to collaboratively define
what we want to do with it
Project: https://pmix.github.io/master
Code: https://github.com/pmix

Contenu connexe

Tendances

OpenDataPlane - Bill Fischofer
OpenDataPlane - Bill FischoferOpenDataPlane - Bill Fischofer
OpenDataPlane - Bill Fischoferharryvanhaaren
 
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNnvirters
 
KVM Enhancements for OPNFV
KVM Enhancements for OPNFVKVM Enhancements for OPNFV
KVM Enhancements for OPNFVOPNFV
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad HedlundSDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad HedlundChef Software, Inc.
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentOPNFV
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)MarkTaylorIBM
 
Content Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointContent Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointShi Junxiao
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioOPNFV
 
OSMC 2021 | Monitoring Open Source Hardware
OSMC 2021 | Monitoring Open Source HardwareOSMC 2021 | Monitoring Open Source Hardware
OSMC 2021 | Monitoring Open Source HardwareNETWAYS
 
StreamSleuth 100 GbE Network Packet Processing Appliance
StreamSleuth 100 GbE Network Packet Processing ApplianceStreamSleuth 100 GbE Network Packet Processing Appliance
StreamSleuth 100 GbE Network Packet Processing ApplianceMarcus Weddle
 
Ids 009 network attacks
Ids 009 network attacksIds 009 network attacks
Ids 009 network attacksjyoti_lakhani
 
Hotplug and Virtio - Tetsuya Mukawa
Hotplug and Virtio - Tetsuya MukawaHotplug and Virtio - Tetsuya Mukawa
Hotplug and Virtio - Tetsuya Mukawaharryvanhaaren
 
Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)London Microservices
 
Balázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a TunnelBalázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a Tunnelhacktivity
 

Tendances (20)

OpenDataPlane - Bill Fischofer
OpenDataPlane - Bill FischoferOpenDataPlane - Bill Fischofer
OpenDataPlane - Bill Fischofer
 
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
 
KVM Enhancements for OPNFV
KVM Enhancements for OPNFVKVM Enhancements for OPNFV
KVM Enhancements for OPNFV
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad HedlundSDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
 
What's new in Neutron Juno
What's new in Neutron JunoWhat's new in Neutron Juno
What's new in Neutron Juno
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)
 
Content Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointContent Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpoint
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
OSMC 2021 | Monitoring Open Source Hardware
OSMC 2021 | Monitoring Open Source HardwareOSMC 2021 | Monitoring Open Source Hardware
OSMC 2021 | Monitoring Open Source Hardware
 
ClueCon 2017
ClueCon 2017ClueCon 2017
ClueCon 2017
 
Seminario utovrm
Seminario utovrmSeminario utovrm
Seminario utovrm
 
Network and server performance monitoring training
Network and server performance monitoring trainingNetwork and server performance monitoring training
Network and server performance monitoring training
 
StreamSleuth 100 GbE Network Packet Processing Appliance
StreamSleuth 100 GbE Network Packet Processing ApplianceStreamSleuth 100 GbE Network Packet Processing Appliance
StreamSleuth 100 GbE Network Packet Processing Appliance
 
Ids 009 network attacks
Ids 009 network attacksIds 009 network attacks
Ids 009 network attacks
 
Container Service Chaining
Container Service ChainingContainer Service Chaining
Container Service Chaining
 
Hotplug and Virtio - Tetsuya Mukawa
Hotplug and Virtio - Tetsuya MukawaHotplug and Virtio - Tetsuya Mukawa
Hotplug and Virtio - Tetsuya Mukawa
 
Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)
 
Balázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a TunnelBalázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a Tunnel
 

Similaire à SC'16 PMIx BoF Presentation

PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Supportrcastain
 
SC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Featherrcastain
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
WSO2 Microservices Framework for Java - Product Overview
WSO2 Microservices Framework for Java - Product OverviewWSO2 Microservices Framework for Java - Product Overview
WSO2 Microservices Framework for Java - Product OverviewWSO2
 
PMIx: Bridging the Container Boundary
PMIx: Bridging the Container BoundaryPMIx: Bridging the Container Boundary
PMIx: Bridging the Container Boundaryrcastain
 
SC'17 BoF Presentation
SC'17 BoF PresentationSC'17 BoF Presentation
SC'17 BoF Presentationrcastain
 
FME World Tour 2016: FME and continuous integration
FME World Tour 2016: FME and continuous integrationFME World Tour 2016: FME and continuous integration
FME World Tour 2016: FME and continuous integrationGIM_nv
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsRonny López
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...P. Taylor Goetz
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
MPLAB® Harmony Ecosystem
MPLAB® Harmony EcosystemMPLAB® Harmony Ecosystem
MPLAB® Harmony EcosystemDesign World
 
EuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationEuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationrcastain
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017Radisys Corporation
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 

Similaire à SC'16 PMIx BoF Presentation (20)

PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Support
 
SC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Feather
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
WSO2 Microservices Framework for Java - Product Overview
WSO2 Microservices Framework for Java - Product OverviewWSO2 Microservices Framework for Java - Product Overview
WSO2 Microservices Framework for Java - Product Overview
 
PMIx: Bridging the Container Boundary
PMIx: Bridging the Container BoundaryPMIx: Bridging the Container Boundary
PMIx: Bridging the Container Boundary
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
 
SC'17 BoF Presentation
SC'17 BoF PresentationSC'17 BoF Presentation
SC'17 BoF Presentation
 
FME World Tour 2016: FME and continuous integration
FME World Tour 2016: FME and continuous integrationFME World Tour 2016: FME and continuous integration
FME World Tour 2016: FME and continuous integration
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP Applications
 
GPA Software Overview R3
GPA Software Overview R3GPA Software Overview R3
GPA Software Overview R3
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
MPLAB® Harmony Ecosystem
MPLAB® Harmony EcosystemMPLAB® Harmony Ecosystem
MPLAB® Harmony Ecosystem
 
EuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationEuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentation
 
TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Powering up on PowerShell - BSides Greenville 2019
Powering up on PowerShell  - BSides Greenville 2019Powering up on PowerShell  - BSides Greenville 2019
Powering up on PowerShell - BSides Greenville 2019
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

SC'16 PMIx BoF Presentation

  • 2. Agenda • Since we last met…  Address some common questions  Outline PMIx standards process • PMIx v1.x release series  What has been included and planned  Review launch performance status • Roadmap  Features in the pipeline  Potential future features • Questions/comments/discussion
  • 3. Charter? • Define  set of agnostic APIs (not affiliated with specific model code base) to support application  system mgmt software (SMS) interactions • Develop  an open source (non-copy-left licensed) standalone “convenience” library to facilitate adoption • Retain  transparent compatibility across all PMI/PMIx versions • Support  the Instant On initiative • Work  to define/implement new APIs for evolving programming models.
  • 4. What Is PMIx? • Standardized APIs  Four header files (client, server, common, tool)  Enable portability across environments  Support interactions between applications and system management stack • Convenience library  Facilitate adoption  Serves as validation platform for standard • Community
  • 5. What Is PMIx? • Standardized APIs  Four header files (client, server, common, tool)  Enable portability across environments  Support interactions between applications and system management stack • Convenience library  Facilitate adoption  Serves as validation platform for standard • Community
  • 6. Required: Caveat • Containerized operations  Require cross-boundary compatibility  Wireup library of containerized app must be compatible with the local resource manager • Issues occur when migrating  Build under one environment using custom implementation  Move to another environment using different implementation • Convenience library mitigates the problem
  • 7. Why Not Part of MPI Forum? • PMIx is agnostic  No concept of “communicator”  No understanding of MPI  Used by non-MPI libraries • Discussions underway  Bring it into Forum process in some appropriate fashion
  • 8. PMIx “Standards” Process • Modifications/additions  Proposed as RFC  Include prototype implementation • Pull request to convenience library  Notification sent to mailing list • Reviews conducted  RFC and implementation  Continues until consensus emerges • Approval given  Developer telecon (2x/week)
  • 9. PMIx Numbering Major . Minor . Release Version of Standard Track convenience library revisions
  • 10. Regression Testing? • Limited direct capability  Run basic API tests on each PR • Extensive embedded testing  Open MPI includes PMIx master, regularly updated  20k+ tests run every night • Tests all spawn, wireup, publish/lookup, connect/disconnect APIs • Not 100% code coverage
  • 11. Adoption? • Already released  SLURM 16.05 (PMIx v1.1.5) • Planned  IBM, Fujitsu, Adaptive Solutions, Altair, Microsoft • Reference server  Provides surrogate support until native support becomes available  Supports full PMIx standard, limited by RM capabilities  Launches network of PMIx servers across allocation
  • 12. Agenda • Since we last met…  Address some common questions  Outline PMIx standards process • PMIx v1.x release series (Artem Polyakov, Mellanox)  What has been included and planned  Review launch performance status • Roadmap  Features in the pipeline  Potential future features • Questions/comments/discussion
  • 13. 0 0.1 0.2 0.3 0.4 0.5 0.6 0 5 10 15 20 25 30 35 MPI_Init (sec) nodes key exchange type: collective direct-fetch direct-fetch/async PMIx/UCX job-start use case Hardware: • 32 nodes • 2 processors (Xeon E5-2680 v2) • 20 cores per node • 1 proc per core • Open MPI v2.1 (modified to enable ability to avoid the barrier at the end of MPI_Init) • PMIx v1.1.5 • UCX (f3f9ad7) * * direct-fetch/async assumes no synchronization barrier inside MPI_Init.
  • 14. 0 0.1 0.2 0.3 0.4 0.5 0.6 0 5 10 15 20 25 30 35 MPI_Init (sec) nodes key exchange type: collective direct-fetch direct-fetch/async PMIx/UCX job-start usecase “allgatherv” on all submitted keys Synchronization overhead
  • 15. v1.2.0 • Extension of v1.1.5  v1.1.5 • Each proc stores own copy of data  v1.2 • Data stored in shared memory owned by PMIx server • Each proc has read-only access • Benefits  Minimizes memory footprint  Faster launch times
  • 16. Shared memory data storage (architecture) RTE daemon MPI process PMIx-client PMIx-server MPI process PMIx-client PMIx Shared mem (store blobs) usock usock • Server provides all the data through the shared memory • Each process can fetch all the data with 0 server-side CPU cycles! • In the case of direct key fetching if a key is not found in the shared memory – a process will request it from the server using regular messaging mechanism.
  • 17. Shared memory data storage (architecture) RTE daemon MPI process PMIx-client PMIx-server MPI process PMIx-client PMIx Shared mem (store blobs) usock usock • Server provides all the data through the shared memory • Each process can fetch all the data with 0 server-side CPU cycles! • In the case of direct key fetching if a key is not found in the shared memory – a process will request it from the server using regular messaging mechanism.
  • 18. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 100 200 300 400 500 600 700 PMIx_Get(all), sec processes Messages Shmem Shared memory data storage (synthetic performance test) Hardware: • 32 nodes • 2 processors (Intel Xeon E5-2680 v2) • 20 cores per node • 1 proc per core • Open MPI v2.1 • PMIx v1.2 https://github.com/pmix/master/tree/master/contrib/perf_tools • 10 keys per process • 100-element arrays of integers
  • 19. Shared memory data storage (synthetic performance test) [2] Nodes procs Messages (us) Shmem (us) local key remote key local key remote key 1 20 8.8 6.4 2 40 8.9 9.2 6.5 6.5 4 80 8.8 9.2 6.5 6.5 8 160 8.7 9.2 6.4 6.4 16 320 8.4 9.2 6.5 6.5 32 640 8.2 9.2 6.5 6.5 Advantages: • Stable timings for a separate key access(no difference between local and remote key access) • Up to 30% improvement for the remote key fetch • Significant CPU offload on the SMP systems with large core count.
  • 20. Shared memory data storage (synthetic performance test) [3] 0.20 0.07 0.07 0.00 0.05 0.10 0.15 0.20 0.25 PMI2 PMIx msg PMIx shmem ms CL1: avg(Put) 0.22 0.08 0.07 0.00 0.05 0.10 0.15 0.20 0.25 PMI2 PMIx msg PMIx shmem ms CL1: avg(Get) CL1 Hardware: • 15 nodes • 2 processors (Intel Xeon X5570) • 8 cores per node • 1 proc per core CL2 Hardware: • 64 nodes • 2 processors (Intel Xeon E5-2697 v3) • 28 cores per node • 1 proc per core 0.37 0.06 0.06 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 PMI2 PMIx msg PMIx shmem ms CL2: avg(Put) 0.32 0.11 0.06 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 PMI2 PMIx msg PMIx shmem ms CL2: avg(Get)
  • 21. PMIx Roadmap 2014 1/2016 1.1.3 11/2016 1.2.0 6/2016 1.1.4 RM Production Releases Bug fixes 8/2016 1.1.5 Bug fixes Shared memory datastore In Pipeline David Solt IBM
  • 22. Tool Support • Tool connection support  Allow tools to connect to local PMIx server  Specify system vs application System PMIx server Mpirun/ orted Tool RM P
  • 23. Tool Support • Query  Network topology • Array of proc network-relative locations • Overall topology (e.g., “dragonfly”)  Running jobs • Currently executing job namespaces • Array of proc location, status, PID  Resources • Available system resources • Array of proc location, resource utilization (ala “top”)  Queue status • Current scheduler queue backlog Examples Debuggers?
  • 24. New Flexibility • Plugin architecture  DLL-based system  Supports proprietary binary components  Allows multiple implementations of common functionality • Buffer pack/unpack operations • Communications (TCP, shared memory,…) • Security
  • 25. Obsolescence Protection • Plugin architecture • Cross-version support  Automatic detection of client/server version  Properly adjust for changes in structures, protocols  Ensure clients always get what they can understand  Backward support to the v1.1.5 level
  • 26. Notification • Plugin architecture • Cross-version support • Event notification  System generated, app generated  Resolves issues in original API, implementation  Register for broad range of events • Constrained by availability of backend support
  • 27. Logging • Plugin architecture • Cross-version support • Event notification • Log data  Store desired data in system data store(s) • Specify hot/warm/cold, local/remote, database and type of database, …  Log output to stdout/err  Supports binary and non-binary data • Heterogeneity taken care of for you
  • 28. PMIx Roadmap 2014 1/2016 1.1.3 11/2016 1.2.0 6/2016 1.1.4 RM Production Releases Bug fixes 8/2016 1.1.5 Bug fixes Shared memory datastore Future Ralph Castain Intel
  • 29. Future Features Reference Server • Initial version: DVM  Interconnected PMIx servers  High-speed, resilient collectives • bcast, allgather/barrier • Future updates: ”fill” mode  Servers proxy clients to host RM  Complete missing host functionality Winter 2017
  • 30. Future Features Debugger Support • Ongoing discussions with MPI Forum Tools WG  Implement proposed MPIR2 interface  Enhance scalability • Exploit tool connection  Obtain proctable info  Use PMIx_Spawn to launch daemons, auto-wireup, localize proctable retrieval • Extend available supporting info  Network topology, bandwidth utilization  Event notification Winter 2017
  • 31. Future Features Network Support Framework • Interface to 3rd party libraries • Enable support for network features  Precondition of network security keys  Retrieval of endpoint assignments, topology • Data made available  In initial job info returned at proc start  Retrieved by Query Spring 2017
  • 32. Future Features IO Support • Reduce launch time  Current practices • Reactive cache/forward • Static builds  Proactive pre-positioning • Examine provided job/script • Return array of binaries and libraries required for execution • Enhance execution  Request async file positioning • Callback when ready  Specify persistence options Summer 2017
  • 33. Future Features Generalized Data Store (GDS) • Abstracted view of data store  Multiple plugins for different implementations • Local (hot) storage • Distributed (warm) models • Database (cold) storage • Explore alternative paradigms  Job info, wireup data  Publish/lookup  Log Fall 2017
  • 34. Open Discussion We now have an interface library the RMs will support for application-directed requests Need to collaboratively define what we want to do with it Project: https://pmix.github.io/master Code: https://github.com/pmix