TeamStation AI System Report LATAM IT Salaries 2024
HPC Performance & Development Tuning tools for scientists to go parallel faster with allinea
1. Get Performance on Intel® Xeon Phi™ with
Allinea MAP and Allinea DDT
Discovering bottlenecks without pain
2. In my Parallel Universe…
… we develop new antibiotics faster than
bacteria develop resistance
... every household can prototype and evolve
their own 3D-printed designs
… accurate simulation of the natural world is
taken for granted
3. So I decided to…
… create parallel development tools for scientists:
We’re accelerating the pace of scientific progress
4. HPC on the critical path to progress
Single Core Era
Multi-Core Era
Many-Core Era
Constraints :
Constraints :
Constraints :
-Power
-Power
-Parallel software availability
-Scalability
-Programming models
Performance
-Complexity of algorithms
Time(years)
5. Allinea MAP
Increase application performance
• Parallel profiler designed for:
‒ C/C++, Fortran
‒ MPI code
Interdependent or independent processes
‒ Multithreaded code
Monitor the main threads for each process
‒ Accelerated codes
GPUs, Intel® Xeon Phi™
• Improve productivity :
‒ Helps you detect performance issues quickly and easily
‒ Tells you immediately where your time is spent in your source code
‒ Helps you to optimize your application efficiently
6. Allinea MAP 4.2
New features in 2013
• Support for I/O metrics
‒ I/O can be a major bottleneck in HPC systems
‒ Find the optimal configuration for your file system.
Benefit : Broader profiling and analysis capabilities to solve
even more performance issues.
• Support for Intel® Xeon Phi™
‒ Already supported on Allinea DDT
‒ Officially extended to profiling
Benefit : Ensure you are getting the best performance from
new technology.
7. Optimizing for Intel® Xeon Phi™
Where do you start?
“Code that’s well-optimized for the host
usually performs pretty well on the cards”
- Almost everybody
8. Optimizing for Intel® Xeon Phi™
But what matters?
Vectorization
Performance
Other
stuff
12. Optimizing for Intel® Xeon Phi™
Is my code well-vectorized?
Not in this loop
(16.5% of total time)
… maybe?
13. Allinea DDT
Unified interface for debugging
• Full, graphical debugger designed for :
‒ C/C++, Fortran, Intel® Xeon Phi™, UPC, …
‒ MPI, OpenMP and mixed-mode code
• Unified interface with Allinea MAP :
‒ Just what you need when you’ve added
OpenMP and now everything segfaults!
‒ One interface eliminates learning curve
‒ Spend more time on your results
• Slash your time to develop :
‒ Reproduces and triggers your bugs instantly
‒ Helps you easily understand where issues come from quickly
‒ Helps you to fix them as swiftly as possible
14. Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™
“While I was porting CAMB to offload certain parts of it to Intel®
Xeon Phi™, I wasted weeks debugging it because the offloads
were basically opaque. I only had print statements to help me.”
15. Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™
“Using DDT's new offload debugging I can now look at the offload
code and look at the state of the array on the Intel® Xeon Phi™
side before it is manipulated”
16. Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™
Fix is easy - either set NOCOPY->IN or just set the thing
to zero on the MIC side which is probably cheaper.”
17. Allinea at the forefront of science
with COSMOS and Intel® Xeon Phi™
“I’m now using MAP – it shows that the code is fairly well vectorised at 70%.
This will have to be improved a bit to get the most out of the coprocessors.”
18. Allinea Software
• Ten years of high-quality development tools
‒ Leading in HPC software tools market worldwide
‒ Global customer base
• Making parallel programming accessible to the widest range of
scientists and programmers
‒ Design an unrivaled productive and easy-to-use development
environment…
‒ … To help you reach the highest level of performance and scalability
‒ Define a new standard of customer support
19. Summary
The premier Intel® Xeon Phi™ development environment from Allinea
– Is your code ready for Intel® Xeon Phi™? Run a Performance Report!
– See which loops are important to vectorize with Allinea MAP
– Stay productive with full profiling and debugging on both host and
coprocessor
– Powerful unified interface with industry-leading technical support to help
you get the job finished faster
Visit us at our booth #1719 to see this in action!
Enter our Performance Reports competition to
win a Kindle Fire every day!
Notes de l'éditeur
It’s not often that marketing lives up to its hype, but something we’ve consistently heard from users around the world porting their codes to Xeon Phi is that – once they’ve done a good job of optimizing for the host – the performance on the Phi is normally pretty good right away.
The reason is that even on a standard Xeon these days, you need to take advantage of vectorized instructions to get good performance. With 512-bit registers, vectorization is absolutely critical to achieving good performance on the Xeon Phi. There’s no point in sending all the cars down one lane of the highway!
The Intel compilers can give very detailed reports about what they’re doing to each loop using the –vec-report flags, but even on a small program you need to know which loops are worth spending your time on and which you can ignore.
Allinea MAP shows you the behavior of your code at a single glance – let me briefly walk you through the interface here. <talk about how to interpret the metric graphs and the sparkline graphs next to the code viewer. Finish by pointing out that the CPU floating-point vector graph is at 0 for the selected region of time!
This is Allinea MAP’s answer to our question – there’s an important loop taking 16.5% of the total program time that isn’t vectorizing at all! Now we know which lines of code are affected, we can ask the compiler for a report and investigate further.
It’s not just profiling that works the same – our unified interface is shared with Allinea DDT, a full-featured debugger supporting a huge range of platforms and codes including the Xeon Phi.
You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
You can’t achieve full performance by looking through a microscope all the time – you have to be able to step back from the quest to vectorized the next loop, and the next, and ask “is this worth it? Is there a library I can use here? Can I refactor my code here?” MAP gives you the oversight and insight you need to answer these questions.
And when you come to run your code on the card, Allinea MAP gathers exactly the same information and displays it in exactly the same way