SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
2
Agenda
• VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions
about performance in the same language a program was written in
• Concepts, metrics and technology inside
• VTune Amplifier XE OpenMP Analysis Workflow
• OpenMP analysis for hybrid MPI + OpenMP applications
• Summary
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
3
Typical customer questions on parallelization efficiency of
OpenMP* applications
• “I put pragmas but why my
speed up is far from linear?
• Parallelization inefficiency
• “I ran my app on a system with
bigger number of cores but it
does not run as efficient as on
smaller number”
• Scalability issues
• Is serial time of my application significant
to prevent scaling?
• How efficient is my OpenMP
parallelization?
• If not, how much gain can be achieved if I
invest in fighting with the inefficiencies?
• What OpenMP regions/loops/barriers are
worth to tune?
• What are the particular problems with them?
Decomposing
the questions
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
4
If Performance Information is OpenMP “Unaware”..
OpenMP “unaware” views of VTune Amplifier XE
Difficult to detect problems, customers might even “blame”
runtime seeing CPU time consumption there not understanding
that this is a result of parallelization inefficiency
The questions are tied to OpenMP program structure – #pragmas
Answers should be given the same way to be understandable and
actionable
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Is serial time of my application significant to prevent scaling?
How efficient is my parallelization towards ideal parallel execution?
How much theoretical gain I can get if invest in tuning?
What regions are more
perspective to invest?
Links to grid view for
more details on
inefficiency
5
VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions
about performance in the same language a program was written in
Overview on summary pane
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
6
Key of OpenMP awareness in VTune – Region based views
and metrics
Definition of Region Potential Gain (elapsed time metric)
Lock spinning (sampling)
Effective time (sampling)
Imbalance (tracing)
Actual Parallel Region Elapsed Time
Estimated Ideal Time =
Effective time / Number of Threads
Potential Gain as a sum of inefficiencies normalized by num of threads
Fork Join
Scheduling (sampling)
Work forking (sampling)
Atomics (sampling)
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
7
Technology under VTune Amplifier XE OpenMP Analysis
Tracing of OpenMP constructions to provide region/work sharing context and
precise imbalance on barriers
• Provided to VTune by Intel OpenMP Runtime under profiling
• Fork-Join points of parallel regions with number of working threads (Intel Compiler 14 and later)
• OpenMP construct barrier points with imbalance info and OpenMP loop metadata
• -parallel-source-info=2 compiler option to embed source file name to a region name
• Looking at transition to OMPT, working with John M.-C. on interface enrichments for low overhead
analysis
Sampling to define and classify CPU time - user’s code and OpenMP RTL work
• Classification: Locking, Scheduling, Work Forking
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
8
VTune Amplifier XE OpenMP Analysis Workflow
Start with HPC Performance Characterization analysis
Explore CPU Utilization aspect metrics related to OpenMP in summary,
grid, source view
CL: >amplxe-cl –collect hpc-performance <my_app>
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
9
Per Region Details in grid view: inefficiencies in wall time -
classification and issue highlighting
Imbalance on a loop barrier
Dynamic scheduling overhead on a parallel loop
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
10
Details in Grid View: Serial Time Hotspots
Serial hotspots under
Master Thread
Time Filter to exclude
initialization phase
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
11
Details on Scalable Timeline
Super tiny timeline display mode – a bird-eye’s view having all data without scrolling
Intel® Xeon Phi™ profiling result with 288 threads
Region frames on the ruler
More green color – more efficient multithreaded execution
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
12
Details for a Region at source file level
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
13
Summary
• VTune Amplifier XE OpenMP analysis answers on customer’s questions
about performance on the language of OpenMP constructs
• The analysis is well-scalable for many-core systems with good balance of
tracing and sampling collection technologies
• The OpenMP analysis is “MPI-aware” that is helpful for inner-node hybrid
MPI + OpenMP application tuning
• The full feature set is available in VTune Amplifier XE 2018 with Intel
OpenMP and Intel MPI runtimes as a part of Intel® Parallel Studio XE 2018
14
Back-up
15
A Use Case: NPB CG imbalance improvement
• Step 1: Profiling original application – NPB CG (Class B)
There is a region with promising potential gain – go to Grid View for
more details
16
A Use Case: NPB CG imbalance improvement
• Step 1: Profiling original application – NPB CG (Class B)
• There are barriers in region – use experimental “/OpenMP Region/OpenMP Barrier..” grouping
• Imbalance on omp loop in cg.f, lines: 572 - 580, schedule is static
17
A Use Case: NPB CG imbalance improvement
• Step 2: Trying dynamic scheduling omp do schedule (dynamic)
Elapsed time increased – no improvement
Go to Grid View for details
18
A Use Case: NPB CG imbalance improvement
• Step 2: Trying dynamic scheduling “omp do schedule (dynamic)”
Default chunk size is 1 and it led to scheduling overhead
Let’s try bigger chunk size
19
A Use Case: NPB CG imbalance improvement
• Step 3: Trying dynamic schedule with chunk 20
Improved original elapsed time ~15%, eliminated imbalance
Intel Confedencial
20
Back-up
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product
when combined with other products.
Copyright © 2017, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
21
OPENMP ANALYSIS IN VTUNE AMPLIFIER XE

Contenu connexe

Similaire à OPENMP ANALYSIS IN VTUNE AMPLIFIER XE

Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel Final
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel FinalHPC Facility Designing for next generation HPC systems Ram Nagappan Intel Final
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel FinalRamkumar Nagappan
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataDESMOND YUEN
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabMichelle Holley
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018AWS User Group Bengaluru
 
Omni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing WorkOmni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing Workinside-BigData.com
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...Amazon Web Services
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkMichelle Holley
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Software Brasil
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEGetting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEIntel Software Brasil
 
Scaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovScaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovDenis Nagorny
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeIntel® Software
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY
 
Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Intel® Software
 
Intel precision medicine apr 2015
Intel precision medicine apr 2015Intel precision medicine apr 2015
Intel precision medicine apr 2015Ketan Paranjape
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Acceleratorinside-BigData.com
 

Similaire à OPENMP ANALYSIS IN VTUNE AMPLIFIER XE (20)

Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel Final
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel FinalHPC Facility Designing for next generation HPC systems Ram Nagappan Intel Final
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel Final
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big Data
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/Lab
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
Omni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing WorkOmni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing Work
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEGetting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
 
Scaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovScaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanov
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the Edge
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop
 
Intel precision medicine apr 2015
Intel precision medicine apr 2015Intel precision medicine apr 2015
Intel precision medicine apr 2015
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
 

Plus de DESMOND YUEN

2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdfDESMOND YUEN
 
Small Is the New Big
Small Is the New BigSmall Is the New Big
Small Is the New BigDESMOND YUEN
 
Intel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefIntel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefDESMOND YUEN
 
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsCryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsDESMOND YUEN
 
Intel 2021 Product Security Report
Intel 2021 Product Security ReportIntel 2021 Product Security Report
Intel 2021 Product Security ReportDESMOND YUEN
 
How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...DESMOND YUEN
 
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreNASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreDESMOND YUEN
 
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...DESMOND YUEN
 
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
PUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIESPUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIES
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIESDESMOND YUEN
 
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEBUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEDESMOND YUEN
 
An Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelAn Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelDESMOND YUEN
 
Changing demographics and economic growth bloom
Changing demographics and economic growth bloomChanging demographics and economic growth bloom
Changing demographics and economic growth bloomDESMOND YUEN
 
Intel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyIntel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyDESMOND YUEN
 
2021 private networks infographics
2021 private networks infographics2021 private networks infographics
2021 private networks infographicsDESMOND YUEN
 
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...DESMOND YUEN
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI TodayDESMOND YUEN
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksDESMOND YUEN
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...DESMOND YUEN
 
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm.""Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."DESMOND YUEN
 
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...DESMOND YUEN
 

Plus de DESMOND YUEN (20)

2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf
 
Small Is the New Big
Small Is the New BigSmall Is the New Big
Small Is the New Big
 
Intel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefIntel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product Brief
 
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsCryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
 
Intel 2021 Product Security Report
Intel 2021 Product Security ReportIntel 2021 Product Security Report
Intel 2021 Product Security Report
 
How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...
 
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreNASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
 
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
 
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
PUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIESPUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIES
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
 
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEBUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
 
An Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelAn Introduction to Semiconductors and Intel
An Introduction to Semiconductors and Intel
 
Changing demographics and economic growth bloom
Changing demographics and economic growth bloomChanging demographics and economic growth bloom
Changing demographics and economic growth bloom
 
Intel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyIntel’s Impacts on the US Economy
Intel’s Impacts on the US Economy
 
2021 private networks infographics
2021 private networks infographics2021 private networks infographics
2021 private networks infographics
 
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
 
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm.""Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
 
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
 

Dernier

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 

Dernier (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

OPENMP ANALYSIS IN VTUNE AMPLIFIER XE

  • 1.
  • 2. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 2 Agenda • VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions about performance in the same language a program was written in • Concepts, metrics and technology inside • VTune Amplifier XE OpenMP Analysis Workflow • OpenMP analysis for hybrid MPI + OpenMP applications • Summary
  • 3. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 3 Typical customer questions on parallelization efficiency of OpenMP* applications • “I put pragmas but why my speed up is far from linear? • Parallelization inefficiency • “I ran my app on a system with bigger number of cores but it does not run as efficient as on smaller number” • Scalability issues • Is serial time of my application significant to prevent scaling? • How efficient is my OpenMP parallelization? • If not, how much gain can be achieved if I invest in fighting with the inefficiencies? • What OpenMP regions/loops/barriers are worth to tune? • What are the particular problems with them? Decomposing the questions
  • 4. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 4 If Performance Information is OpenMP “Unaware”.. OpenMP “unaware” views of VTune Amplifier XE Difficult to detect problems, customers might even “blame” runtime seeing CPU time consumption there not understanding that this is a result of parallelization inefficiency The questions are tied to OpenMP program structure – #pragmas Answers should be given the same way to be understandable and actionable
  • 5. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Is serial time of my application significant to prevent scaling? How efficient is my parallelization towards ideal parallel execution? How much theoretical gain I can get if invest in tuning? What regions are more perspective to invest? Links to grid view for more details on inefficiency 5 VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions about performance in the same language a program was written in Overview on summary pane
  • 6. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 6 Key of OpenMP awareness in VTune – Region based views and metrics Definition of Region Potential Gain (elapsed time metric) Lock spinning (sampling) Effective time (sampling) Imbalance (tracing) Actual Parallel Region Elapsed Time Estimated Ideal Time = Effective time / Number of Threads Potential Gain as a sum of inefficiencies normalized by num of threads Fork Join Scheduling (sampling) Work forking (sampling) Atomics (sampling)
  • 7. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 7 Technology under VTune Amplifier XE OpenMP Analysis Tracing of OpenMP constructions to provide region/work sharing context and precise imbalance on barriers • Provided to VTune by Intel OpenMP Runtime under profiling • Fork-Join points of parallel regions with number of working threads (Intel Compiler 14 and later) • OpenMP construct barrier points with imbalance info and OpenMP loop metadata • -parallel-source-info=2 compiler option to embed source file name to a region name • Looking at transition to OMPT, working with John M.-C. on interface enrichments for low overhead analysis Sampling to define and classify CPU time - user’s code and OpenMP RTL work • Classification: Locking, Scheduling, Work Forking
  • 8. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 8 VTune Amplifier XE OpenMP Analysis Workflow Start with HPC Performance Characterization analysis Explore CPU Utilization aspect metrics related to OpenMP in summary, grid, source view CL: >amplxe-cl –collect hpc-performance <my_app>
  • 9. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 9 Per Region Details in grid view: inefficiencies in wall time - classification and issue highlighting Imbalance on a loop barrier Dynamic scheduling overhead on a parallel loop
  • 10. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 10 Details in Grid View: Serial Time Hotspots Serial hotspots under Master Thread Time Filter to exclude initialization phase
  • 11. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 11 Details on Scalable Timeline Super tiny timeline display mode – a bird-eye’s view having all data without scrolling Intel® Xeon Phi™ profiling result with 288 threads Region frames on the ruler More green color – more efficient multithreaded execution
  • 12. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 12 Details for a Region at source file level
  • 13. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 13 Summary • VTune Amplifier XE OpenMP analysis answers on customer’s questions about performance on the language of OpenMP constructs • The analysis is well-scalable for many-core systems with good balance of tracing and sampling collection technologies • The OpenMP analysis is “MPI-aware” that is helpful for inner-node hybrid MPI + OpenMP application tuning • The full feature set is available in VTune Amplifier XE 2018 with Intel OpenMP and Intel MPI runtimes as a part of Intel® Parallel Studio XE 2018
  • 15. 15 A Use Case: NPB CG imbalance improvement • Step 1: Profiling original application – NPB CG (Class B) There is a region with promising potential gain – go to Grid View for more details
  • 16. 16 A Use Case: NPB CG imbalance improvement • Step 1: Profiling original application – NPB CG (Class B) • There are barriers in region – use experimental “/OpenMP Region/OpenMP Barrier..” grouping • Imbalance on omp loop in cg.f, lines: 572 - 580, schedule is static
  • 17. 17 A Use Case: NPB CG imbalance improvement • Step 2: Trying dynamic scheduling omp do schedule (dynamic) Elapsed time increased – no improvement Go to Grid View for details
  • 18. 18 A Use Case: NPB CG imbalance improvement • Step 2: Trying dynamic scheduling “omp do schedule (dynamic)” Default chunk size is 1 and it led to scheduling overhead Let’s try bigger chunk size
  • 19. 19 A Use Case: NPB CG imbalance improvement • Step 3: Trying dynamic schedule with chunk 20 Improved original elapsed time ~15%, eliminated imbalance
  • 21. Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2017, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 21