SlideShare une entreprise Scribd logo
1  sur  43
TotalView for
OpenPOWER, CUDA and
OpenMP
Chris Gottbrath
Dean Stewart
May 18, 2015
ScicomP 2015
Agenda
• Introduction
• TotalView Overview
• OpenMP, OMPT and OMPD
• Current work and future plans
• Questions and wrap-up
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Introduction
About Rogue Wave Software
Rogue Wave proven technical solutions simplify the growing complexity of
building and testing quality software code. Rogue Wave customers improve
software quality and ensure code integrity, while shortening development cycle
times.
Founded: 1989
Headquarters: Boulder, CO
Employees: 250
Offices Worldwide: 9
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Company timeline
1989
Cross-platform
commercial math
& statistics
libraries for C++
Technology Timeline
Corporate Timeline
1994
First commercial
database library
for C++
1989
Rogue Wave
established
Tools.h++
1996
Rogue Wave
publicly listed on
NASDAQ
2003
Rogue Wave
acquired by
Quovadx
2007
Rogue Wave
spun out of
Quovadx by
Battery
Ventures
2009
Rogue Wave
acquires
TotalView
and Visual
Numerics
2010
Rogue Wave
acquires
Acumem
2008
First graphical
reverse
debugger for
C, C++ and
Fortran on
Linux
2011
First Infiniband
Cluster-capable
reverse debugger
First cache
optimization
product to market
2010
First GPU
enabled
commercial
analytics in
FORTRAN
2012
Rogue Wave
acquires
Visualizations
for C++
Audax Group
acquires
Rogue Wave
2013
Rogue Wave
acquires
OpenLogic &
Klocwork
1989 2015
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Rogue Wave Solution Portfolio
© 2015 Rogue Wave Software, Inc. All Rights Reserved
HPC Trends
• What do we see
– NVIDIA Tesla GP-GPU computational accelerators
– Intel Xeon Phi Coprocessors
– Complex memory hierarchies (numa, device vs host, etc)
– Custom languages such as CUDA and OpenCL
– Directive based programming such as OpenACC and OpenMP
– Core and thread counts going up
• A lot of complexity to deal with if you want performance
– C or Fortran with MPI starts to look “simple”
– Everything is Multiple Languages / Parallel Paradigms
– Up to 4 “kinds” of parallelism (cluster, thread, heterogeneous, vector)
– Data movement and load balancing
© 2015 Rogue Wave Software, Inc. All Rights Reserved
How does Rogue Wave help?
• Troubleshooting and analysis tool
– Visibility into applications
– Control over applications
• Scalability
• Usability
• Support for HPC platforms and languages
TotalView debugger
© 2015 Rogue Wave Software, Inc. All Rights Reserved
TotalView Overview
Application Analysis and Debugging Tool: Code Confidently
• Debug and Analyse C/C++ and Fortran on Linux™, Unix or
Mac OS X
• Laptops to supercomputers
• Makes developing, maintaining, and supporting critical apps
easier and less risky
Major Features
• Easy to learn graphical user interface with data visualization
• Parallel Debugging
– MPI, Pthreads, OpenMP™, Fortran Coarrays
– CUDA™, OpenACC®, and Intel® Xeon Phi™ coprocessor
• Low tool overhead resource usage
• Includes a Remote Display Client which frees you to work
from anywhere
• Memory Debugging with MemoryScape™
• Deterministic Replay Capability Included on Linux/x86-64
• Non-interactive Batch Debugging with TVScript and the CLI
• TTF & C++View to transform user defined objects
What is TotalView®?
© 2015 Rogue Wave Software, Inc. All Rights Reserved
What Is MemoryScape®?
• Runtime Memory Analysis : Eliminate Memory Errors
– Detects memory leaks before they are a problem
– Explore heap memory usage with powerful analytical tools
– Use for validation as part of a quality software development process
• Major Features
– Included in TotalView, or Standalone
– Detects
• Malloc API misuse
• Memory leaks
• Buffer overflows
– Supports
• C, C++, Fortran
• Linux, Unix, and Mac OS X
• Intel® Xeon Phi™
• MPI, pthreads, OMP, and remote apps
– Low runtime overhead
– Easy to use
• Works with vendor libraries
• No recompilation or instrumentation
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Deterministic Replay Debugging
• Reverse Debugging: Radically simplify your debugging
– Captures and Deterministically Replays Execution
• Not just “checkpoint and restart”
– Eliminate the Restart Cycle and Hard-to-Reproduce Bugs
– Step Back and Forward by Function, Line, or Instruction
• Specifications
– A feature included in TotalView on Linux x86 and x86-64
• No recompilation or instrumentation
• Explore data and state in the past just like in a
live process, including C++View transformations
– Replay on Demand: enable it when you want it
– Supports MPI on Ethernet, Infiniband, Cray XE Gemini
– Supports Pthreads, and OpenMP
– New: Save / Load Replay Information
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Supported Platforms
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Platforms C/C++ Compilers Fortran Compilers
Linux x86 Gcc, Intel, PGI Absoft, GNU, Intel, PGI
Linux x86-64 Gcc, Intel, PGI Absoft, GNU, Intel, PGI
Power Linux Gcc, XL C++ GNU, XL Fortran
RS6000 Power AIX Gcc, XL C++ XL Fortran
BlueGene Gcc, XL C++ XL Fortran
TotalView for the NVIDIA ® GPU Accelerator
• NVIDIA CUDA 6.0, 6.5 and 7.0
• Features and capabilities include
– Support for dynamic parallelism
– Support for MPI based clusters and multi-
card configurations
– Flexible Display and Navigation on the
CUDA device
• Physical (device, SM, Warp, Lane)
• Logical (Grid, Block) tuples
– CUDA device window reveals what is
running where
– Support for types and separate memory
address spaces
– Leverages CUDA memcheck
© 2015 Rogue Wave Software, Inc. All Rights Reserved
• The following 6 slides are from an SC14 tutorial by:
• Damian Alvarez
– d.alvarez.mallon@fz-juelich.de
• Dr. Mike Ashworth
– mike.ashworth@stfc.ac.uk
• Vincent Betro, Ph. D.
– vbetro@utk.edu
• Chris Gottbrath
– Chris.Gottbrath@roguewave.com
• Nikolay Piskun, Ph.D.
– Nikolay.Piskun@roguewave.com
• Sandra Wienke
– Wienke@itc.rwth-aachen.de
11.17.2014 SC ‘14
• Setting breakpoints in CUDA kernels
– Start debugging (e.g. “Go”)
– Message box when
kernel is loaded:
– Set kernel
breakpoints as in
host code
11.17.2014 SC ‘14
• Debugger thread IDs in Linux CUDA process
– Host thread: positive no.
– CUDA thread: negative no.
• GPU thread navigation
– Logical coordinates: blocks (3 dimensions),
threads (3 dimensions)
– Physical coordinates: device, SM, warp, lane
– Only valid selections are permitted
11.17.2014 SC ‘14
• Single Stepping
– Advances all GPU hardware threads within same warp
– Stepping over a __syncthreads() call advances all threads within
the block
• Advancing more than just one warp
– “Run To” a selected line
number in the source pane
– Set a breakpoint and
“Continue” the process
• Halt
– Stops all the host and
device threads
11.17.2014 SC ‘14
…
t0 t1 t31
…
t32 t63
…
warp
group of 32 threads
same program counter (PC)
• Displaying CUDA device properties
– “Tools” - “CUDA Devices”
– Helps mapping between
logical & physical coordinates
• PCs across SMs, warps, lanes
– valid, active, divergent
11.17.2014 SC ‘14
program
counter (PC)
within warp
…
• Displaying GPU data
– “Dive” into variable or
watch “Type” in “Expression List”
– Device memory spaces: “@”
notation
11.17.2014 SC ‘14
Storage Qualifier Meaning of address
@global Offset within global storage
@shared Offset within shared storage
@local Offset within local storage
@register PTX register name
@generic Offset within generic address space (e.g.
pointer to global, local or shared
memory)
@constant Offset within constant storage
@parameter Offset within parameter storage (TV built-
in type)
• Checking GPU memory
– Enable “CUDA Memory checking” during startup or in the “Debug”
menu
– Detects global memory addressing violations and misaligned memory
accesses
• Further features
– Multi-device support
– Host-pinned memory support
– MPI-CUDA applications
11.17.2014 SC ‘14
Note: Recent cuda-memcheck versions are
also able to detect race conditions:
cuda-memcheck -–tool racecheck <prog>
OpenMP, OMPT and
OMPD
The Importance of OpenMP
• Programming models are changing to accommodate changes in system architectures
• Higher degree of on-node parallelism: many-core CPUs and/or GPUs
• Hybrid programming models: MPI+X, where OpenMP is an important X
– MPI across the nodes
– OpenMP shared memory parallelism across the cores in a node
• Why use OpenMP?
– The most widely used standard for SMP systems, implemented by many vendors
– Supports the Fortran, C, and C++ languages
– Relatively small and simple specification, and supports incremental parallelism
– OpenMP research keeps it up to date with the latest hardware developments
– OpenMP 4 allows targeting GPUs
• We see momentum building around OpenMP
© 2015 Rogue Wave Software, Inc. All Rights Reserved
OpenMP Debugging Challenges
• Programmers will attempt to exploit MPI+OpenMP hybrid parallelism
• Porting existing large applications from MPI to a hybrid model is nontrivial
and arduous, and having GPUs in the mix makes it even more challenging
• Programming errors such as memory corruption, logic errors and
concurrency bugs are inevitable
• Bottom line
– MPI+OpenMP+GPUs will present programmers with unprecedented
debugging challenges
– They need good debugging tools for MPI+OpenMP+GPUs
© 2015 Rogue Wave Software, Inc. All Rights Reserved
The following are some features that TotalView supports:
• Source-level debugging of the original OpenMP code.
• The ability to plant breakpoints throughout the OpenMP code, including
lines that are executed in parallel.
• Visibility of OpenMP worker threads.
• Access to SHARED and PRIVATE variables in OpenMP PARALLEL code.
• Access to OMP THREADPRIVATE data in code compiled by supported
compilers.
Debugging OpenMP Applications
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Sample OpenMP Debugging Session
OpenMP
worker
threads
Local variables
© 2015 Rogue Wave Software, Inc. All Rights Reserved
OpenMP code high and low level
• Intention is expressed in the OpenMP code
– Serial-correct code with OMP directives expressing parallelism
– Higher level expression of the ideas
• Compiler can create either serial or parallel executable programs from this
source
• A parallel executable includes both the program logic and a runtime
– Teams of threads on the device, the host or both
– Outlined routines
– Runtime calls to dispatch work to worker threads
– Work created on thread A may be executed on threads M-N
© 2015 Rogue Wave Software, Inc. All Rights Reserved
What’s Needed?
• Debugging and performance analysis support from OpenMP implementations
• “OMPT and OMPD: OpenMP Tools Application Programming Interfaces for
Performance Analysis and Debugging” technical report (TR)
– First TR combined OMPT and OMPD in one document
– OMPD was redacted to allow OMPT to progress
• “TR2: OMPT: An OpenMP Tools Application Programming Interface for
Performance Analysis”
– Accepted by the OpenMP ARB (March 2014)
– OMPT is an API for first-party performance tools
• It is now time to circle back and finish OMPD!
– OMPD is similar to OMPT in its functionality, but…
– OMPD is an API for third-party debugging tools
© 2015 Rogue Wave Software, Inc. All Rights Reserved
What Does OMPT/OMPD Do?
• OMPT
– Enable performance tools to gather execution program/runtime costs
– Allow construction of low-overhead performance tools
– Allow logical stack unwinding (to handle outlined parallel regions)
– Provide the state of a thread at any point in time (e.g., idle, work, wait)
– Asynchronous signal safe
• OMPD
– Enable debugging tools to inspect the state of a live process or core file
– Third-party versions of the OpenMP runtime inquiry functions
– Third-party versions of the OMPT inquiry functions
– Intercept the beginning/end of parallel/task regions (e.g., stepping in/out)
– Enable the debugger to construct a “global view” of the process
© 2015 Rogue Wave Software, Inc. All Rights Reserved
How is OMPD Structured?
• Based on a commonly used idiom
– pthread thread_db, MPI MQD, MPI Handles, and others
– The OMPD DLL is “paired” with the OpenMP runtime library
• The debugger
– Attaches to the target OpenMP application
– Loads the OMPD DLL (e.g., via dlopen())
– Registers callbacks in the OMPD DLL
– Makes “requests” into the OMPD DLL to query runtime state
• The OMPD DLL
– Makes callbacks into the debugger (lookup symbols, read/write
memory, etc.)
– Returns the result to the debugger
© 2015 Rogue Wave Software, Inc. All Rights Reserved
OMPD DLL
OMPD DLL loaded into
debugger and callbacks
registered
OMPD “In Action”
Application
Process
OpenMP
Runtime Library
(RTL)
Application
address space
Attach
Debugger
Debugger
address space
Request
OpenMP
state
1
• Handles for threads, parallel regions, tasks
• Parent / child relationships
• State of handles (wait, work, idle)
1
Request types
Request
symbols and
address
information
2
• Lookup symbols in the target process
• Read/write target address spaces
• Support for GPUs
2
Callback functions
Callback ops
3Result
© 2015 Rogue Wave Software, Inc. All Rights Reserved
OMPD Status
• Collaboration between LLNL and Rogue Wave Software (TotalView)
– LLNL: Dong Ahn, Ignacio Laguna, Joachim Protze, Martin Schulz
– RWS: Ariel Burton, John DelSignore
• Resurrect OMPD with the ultimate goal of having it accepted by the ARB
– Fix the current OMPD spec
– Implement OMPD DLL in the Intel OpenMP runtime
– Implement OMPD-based features in TotalView
• IWOMP Paper (International Workshop on OpenMP)
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Current Work
and
Future Plans
TotalView 8.15
New Features
• Scalable Infrastructure
• Faster start up on Linux
• Scales to O(100,000) processes
& O(1,000,000) threads
• Updated CUDA support
• CUDA 7.0
• Support updates including:
• Clang 3.5
• Intel 15.0
• MPT 2.12
• SLES 12, Fedora 21
TV Client
MRNet CP MRNet CP
TV Server TV Server TV Server TV Server
© 2015 Rogue Wave Software, Inc. All Rights Reserved
TotalView’s Scalability Strategy
Multicast
Reduction
TotalView
uses an
“MRNet tree”
of servers
TV Client
MRNet CP MRNet CP
TV Server TV Server TV Server TV Server
Remain lightweight in the backend!
Smarts
Smarts
Push debugger “smarts”
to the backend, not the
whole debugger!
Use classic optimization
techniques too: caching,
hoisting invariants, etc.
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Linux Start up Performance in TV 8.15.4
5x faster (600s / 120s) at 16k between 8.14.1 and 8.15.4.
Note that we switched to mrnet by default in 8.15.0
© 2015 Rogue Wave Software, Inc. All Rights Reserved
BG Start up Performance in TV 8.15.4
6.4x faster (180s / 28s) at 16k between 8.14.1 and 8.15.4.
Note that we switched to mrnet by default in 8.15.0
© 2015 Rogue Wave Software, Inc. All Rights Reserved
TotalView debugs 786,432 cores.
Climb with Rogue Wave towards
exacale.
Some more details on the 786,432 core test
• The test was performed on 48 racks of Sequoia
• The test code
– Implements a Jacobi Linear Equation Solver
– The test code is a hybrid MPI + OpenMP code
– 16 threads per process, one process per node
• The test operations
– Start up
– Setting breakpoints / removing breakpoints
– Single stepping all threads
• Tests performed at a variety of scales to understand scalability
© 2015 Rogue Wave Software, Inc. All Rights Reserved
TotalView’s Memory Efficiency
40
• TotalView is lightweight in the back-end (server)
• Servers don’t “steal” memory from the application
• Each server is a multi-process debugger agent
– One server can debug thousands of processes
– Not a conglomeration of single process debuggers
– TotalView’s architecture provides flexibility (e.g., P/SVR)
– No artificial limits to accommodate the debugger (e.g., BG/Q 1 P/CN)
• Symbols are read, stored, and shared in the front-end (client)
• Example: LLNL APP ADB, 920 shlibs, Linux, 64 P, 4 CN, 16 P/CN, 1 SVR/CN
Process VSZ (largest, MB) RSS (largest, MB) Where
TV Client 4,469 3,998 Front End ONLY
MRNet CP 497 4 Compute Nodes
TV Server 304 53 Compute Nodes
© 2015 Rogue Wave Software, Inc. All Rights Reserved
Future plans
• Contact sales@roguewave.com with any inquires about our future plans
with regard to TotalView product.
Questions
Thanks!
• Visit the website
– http://www.roguewave.com/products/totalview.aspx
– Documentation
– Sign up for an evaluation
– Contact customer support & post on the user forum
© 2015 Rogue Wave Software, Inc. All Rights Reserved

Contenu connexe

Tendances

LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control InterfaceLinaro
 
LAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLinaro
 
LAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLinaro
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...Linaro
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLinaro
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLinaro
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLinaro
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bspNeil Armstrong
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLinaro
 
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by HisiliconLAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by HisiliconLinaro
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
LCA14: LCA14-209: ODP Project Update
LCA14: LCA14-209: ODP Project UpdateLCA14: LCA14-209: ODP Project Update
LCA14: LCA14-209: ODP Project UpdateLinaro
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 

Tendances (20)

LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 
LAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96Boards
 
LAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development Lifecycle
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoS
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bsp
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by HisiliconLAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
LCA14: LCA14-209: ODP Project Update
LCA14: LCA14-209: ODP Project UpdateLCA14: LCA14-209: ODP Project Update
LCA14: LCA14-209: ODP Project Update
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
The GPGPU Continuum
The GPGPU ContinuumThe GPGPU Continuum
The GPGPU Continuum
 

En vedette

Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsFrancisco Zamora-Martinez
 
Making Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and DependableMaking Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and DependableJason Riedy
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structuresleifwalsh
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithmsrajatmay1992
 
The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compressionleifwalsh
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structuresleifwalsh
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingFrancisco Zamora-Martinez
 
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)leifwalsh
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Francisco Zamora-Martinez
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)leifwalsh
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisDhrumil Patel
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...Francisco Zamora-Martinez
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingMilind Gokhale
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingFrancisco Zamora-Martinez
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFrancisco Zamora-Martinez
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
 
Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshesTomáš Hnilica
 

En vedette (20)

Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
 
Making Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and DependableMaking Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and Dependable
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
 
@pospaseis
@pospaseis@pospaseis
@pospaseis
 
Efficient Sorts
Efficient SortsEfficient Sorts
Efficient Sorts
 
The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compression
 
PhD defence
PhD defencePhD defence
PhD defence
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
 
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and Analysis
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshes
 

Similaire à Debugging Numerical Simulations on Accelerated Architectures - TotalView for OpenPOWER, CUDA and OpenMP

Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsRogue Wave Software
 
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorEarly Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorIntel IT Center
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analyticsSouth West Data Meetup
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Srikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth Pilli
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Subbu Rama
 
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Rogue Wave Software
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...CEE-SEC(R)
 
Profiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAPProfiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAPIntel IT Center
 
SonarQube - Should I Stay or Should I Go ?
SonarQube - Should I Stay or Should I Go ? SonarQube - Should I Stay or Should I Go ?
SonarQube - Should I Stay or Should I Go ? Geeks Anonymes
 
FIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE
 

Similaire à Debugging Numerical Simulations on Accelerated Architectures - TotalView for OpenPOWER, CUDA and OpenMP (20)

Debugging CUDA applications
Debugging CUDA applicationsDebugging CUDA applications
Debugging CUDA applications
 
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applications
 
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorEarly Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Srikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latest
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Profiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAPProfiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAP
 
SonarQube - Should I Stay or Should I Go ?
SonarQube - Should I Stay or Should I Go ? SonarQube - Should I Stay or Should I Go ?
SonarQube - Should I Stay or Should I Go ?
 
FIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media Server
 

Plus de Rogue Wave Software

The Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data PerspectiveThe Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data PerspectiveRogue Wave Software
 
No liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureNo liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureRogue Wave Software
 
Disrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformationDisrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformationRogue Wave Software
 
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security –  What’s in...Leveraging open banking specifications for rigorous API security –  What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...Rogue Wave Software
 
Adding layers of security to an API in real-time
Adding layers of security to an API in real-timeAdding layers of security to an API in real-time
Adding layers of security to an API in real-timeRogue Wave Software
 
Getting the most from your API management platform: A case study
Getting the most from your API management platform: A case studyGetting the most from your API management platform: A case study
Getting the most from your API management platform: A case studyRogue Wave Software
 
The forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for youThe forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for youRogue Wave Software
 
Are open source and embedded software development on a collision course?
Are open source and embedded software development on a  collision course?Are open source and embedded software development on a  collision course?
Are open source and embedded software development on a collision course?Rogue Wave Software
 
Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices Rogue Wave Software
 
5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure successRogue Wave Software
 
PSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliancePSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and complianceRogue Wave Software
 
Java 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureJava 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureRogue Wave Software
 
How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)Rogue Wave Software
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Rogue Wave Software
 
How to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to LinuxHow to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to LinuxRogue Wave Software
 
Approaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsApproaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsRogue Wave Software
 
Enterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOSEnterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOSRogue Wave Software
 
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migrationRogue Wave Software
 
How to keep developers happy and lawyers calm
How to keep developers happy and lawyers calmHow to keep developers happy and lawyers calm
How to keep developers happy and lawyers calmRogue Wave Software
 
Open source and embedded software development
Open source and embedded software developmentOpen source and embedded software development
Open source and embedded software developmentRogue Wave Software
 

Plus de Rogue Wave Software (20)

The Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data PerspectiveThe Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data Perspective
 
No liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureNo liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failure
 
Disrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformationDisrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformation
 
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security –  What’s in...Leveraging open banking specifications for rigorous API security –  What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...
 
Adding layers of security to an API in real-time
Adding layers of security to an API in real-timeAdding layers of security to an API in real-time
Adding layers of security to an API in real-time
 
Getting the most from your API management platform: A case study
Getting the most from your API management platform: A case studyGetting the most from your API management platform: A case study
Getting the most from your API management platform: A case study
 
The forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for youThe forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for you
 
Are open source and embedded software development on a collision course?
Are open source and embedded software development on a  collision course?Are open source and embedded software development on a  collision course?
Are open source and embedded software development on a collision course?
 
Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices
 
5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success
 
PSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliancePSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliance
 
Java 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureJava 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the future
 
How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)
 
How to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to LinuxHow to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to Linux
 
Approaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsApproaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC apps
 
Enterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOSEnterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOS
 
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migration
 
How to keep developers happy and lawyers calm
How to keep developers happy and lawyers calmHow to keep developers happy and lawyers calm
How to keep developers happy and lawyers calm
 
Open source and embedded software development
Open source and embedded software developmentOpen source and embedded software development
Open source and embedded software development
 

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Debugging Numerical Simulations on Accelerated Architectures - TotalView for OpenPOWER, CUDA and OpenMP

  • 1. TotalView for OpenPOWER, CUDA and OpenMP Chris Gottbrath Dean Stewart May 18, 2015 ScicomP 2015
  • 2. Agenda • Introduction • TotalView Overview • OpenMP, OMPT and OMPD • Current work and future plans • Questions and wrap-up © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 4. About Rogue Wave Software Rogue Wave proven technical solutions simplify the growing complexity of building and testing quality software code. Rogue Wave customers improve software quality and ensure code integrity, while shortening development cycle times. Founded: 1989 Headquarters: Boulder, CO Employees: 250 Offices Worldwide: 9 © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 5. Company timeline 1989 Cross-platform commercial math & statistics libraries for C++ Technology Timeline Corporate Timeline 1994 First commercial database library for C++ 1989 Rogue Wave established Tools.h++ 1996 Rogue Wave publicly listed on NASDAQ 2003 Rogue Wave acquired by Quovadx 2007 Rogue Wave spun out of Quovadx by Battery Ventures 2009 Rogue Wave acquires TotalView and Visual Numerics 2010 Rogue Wave acquires Acumem 2008 First graphical reverse debugger for C, C++ and Fortran on Linux 2011 First Infiniband Cluster-capable reverse debugger First cache optimization product to market 2010 First GPU enabled commercial analytics in FORTRAN 2012 Rogue Wave acquires Visualizations for C++ Audax Group acquires Rogue Wave 2013 Rogue Wave acquires OpenLogic & Klocwork 1989 2015 © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 6. Rogue Wave Solution Portfolio © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 7. HPC Trends • What do we see – NVIDIA Tesla GP-GPU computational accelerators – Intel Xeon Phi Coprocessors – Complex memory hierarchies (numa, device vs host, etc) – Custom languages such as CUDA and OpenCL – Directive based programming such as OpenACC and OpenMP – Core and thread counts going up • A lot of complexity to deal with if you want performance – C or Fortran with MPI starts to look “simple” – Everything is Multiple Languages / Parallel Paradigms – Up to 4 “kinds” of parallelism (cluster, thread, heterogeneous, vector) – Data movement and load balancing © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 8. How does Rogue Wave help? • Troubleshooting and analysis tool – Visibility into applications – Control over applications • Scalability • Usability • Support for HPC platforms and languages TotalView debugger © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 10. Application Analysis and Debugging Tool: Code Confidently • Debug and Analyse C/C++ and Fortran on Linux™, Unix or Mac OS X • Laptops to supercomputers • Makes developing, maintaining, and supporting critical apps easier and less risky Major Features • Easy to learn graphical user interface with data visualization • Parallel Debugging – MPI, Pthreads, OpenMP™, Fortran Coarrays – CUDA™, OpenACC®, and Intel® Xeon Phi™ coprocessor • Low tool overhead resource usage • Includes a Remote Display Client which frees you to work from anywhere • Memory Debugging with MemoryScape™ • Deterministic Replay Capability Included on Linux/x86-64 • Non-interactive Batch Debugging with TVScript and the CLI • TTF & C++View to transform user defined objects What is TotalView®? © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 11. What Is MemoryScape®? • Runtime Memory Analysis : Eliminate Memory Errors – Detects memory leaks before they are a problem – Explore heap memory usage with powerful analytical tools – Use for validation as part of a quality software development process • Major Features – Included in TotalView, or Standalone – Detects • Malloc API misuse • Memory leaks • Buffer overflows – Supports • C, C++, Fortran • Linux, Unix, and Mac OS X • Intel® Xeon Phi™ • MPI, pthreads, OMP, and remote apps – Low runtime overhead – Easy to use • Works with vendor libraries • No recompilation or instrumentation © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 12. Deterministic Replay Debugging • Reverse Debugging: Radically simplify your debugging – Captures and Deterministically Replays Execution • Not just “checkpoint and restart” – Eliminate the Restart Cycle and Hard-to-Reproduce Bugs – Step Back and Forward by Function, Line, or Instruction • Specifications – A feature included in TotalView on Linux x86 and x86-64 • No recompilation or instrumentation • Explore data and state in the past just like in a live process, including C++View transformations – Replay on Demand: enable it when you want it – Supports MPI on Ethernet, Infiniband, Cray XE Gemini – Supports Pthreads, and OpenMP – New: Save / Load Replay Information © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 13. Supported Platforms © 2015 Rogue Wave Software, Inc. All Rights Reserved Platforms C/C++ Compilers Fortran Compilers Linux x86 Gcc, Intel, PGI Absoft, GNU, Intel, PGI Linux x86-64 Gcc, Intel, PGI Absoft, GNU, Intel, PGI Power Linux Gcc, XL C++ GNU, XL Fortran RS6000 Power AIX Gcc, XL C++ XL Fortran BlueGene Gcc, XL C++ XL Fortran
  • 14. TotalView for the NVIDIA ® GPU Accelerator • NVIDIA CUDA 6.0, 6.5 and 7.0 • Features and capabilities include – Support for dynamic parallelism – Support for MPI based clusters and multi- card configurations – Flexible Display and Navigation on the CUDA device • Physical (device, SM, Warp, Lane) • Logical (Grid, Block) tuples – CUDA device window reveals what is running where – Support for types and separate memory address spaces – Leverages CUDA memcheck © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 15. • The following 6 slides are from an SC14 tutorial by: • Damian Alvarez – d.alvarez.mallon@fz-juelich.de • Dr. Mike Ashworth – mike.ashworth@stfc.ac.uk • Vincent Betro, Ph. D. – vbetro@utk.edu • Chris Gottbrath – Chris.Gottbrath@roguewave.com • Nikolay Piskun, Ph.D. – Nikolay.Piskun@roguewave.com • Sandra Wienke – Wienke@itc.rwth-aachen.de 11.17.2014 SC ‘14
  • 16. • Setting breakpoints in CUDA kernels – Start debugging (e.g. “Go”) – Message box when kernel is loaded: – Set kernel breakpoints as in host code 11.17.2014 SC ‘14
  • 17. • Debugger thread IDs in Linux CUDA process – Host thread: positive no. – CUDA thread: negative no. • GPU thread navigation – Logical coordinates: blocks (3 dimensions), threads (3 dimensions) – Physical coordinates: device, SM, warp, lane – Only valid selections are permitted 11.17.2014 SC ‘14
  • 18. • Single Stepping – Advances all GPU hardware threads within same warp – Stepping over a __syncthreads() call advances all threads within the block • Advancing more than just one warp – “Run To” a selected line number in the source pane – Set a breakpoint and “Continue” the process • Halt – Stops all the host and device threads 11.17.2014 SC ‘14 … t0 t1 t31 … t32 t63 … warp group of 32 threads same program counter (PC)
  • 19. • Displaying CUDA device properties – “Tools” - “CUDA Devices” – Helps mapping between logical & physical coordinates • PCs across SMs, warps, lanes – valid, active, divergent 11.17.2014 SC ‘14 program counter (PC) within warp …
  • 20. • Displaying GPU data – “Dive” into variable or watch “Type” in “Expression List” – Device memory spaces: “@” notation 11.17.2014 SC ‘14 Storage Qualifier Meaning of address @global Offset within global storage @shared Offset within shared storage @local Offset within local storage @register PTX register name @generic Offset within generic address space (e.g. pointer to global, local or shared memory) @constant Offset within constant storage @parameter Offset within parameter storage (TV built- in type)
  • 21. • Checking GPU memory – Enable “CUDA Memory checking” during startup or in the “Debug” menu – Detects global memory addressing violations and misaligned memory accesses • Further features – Multi-device support – Host-pinned memory support – MPI-CUDA applications 11.17.2014 SC ‘14 Note: Recent cuda-memcheck versions are also able to detect race conditions: cuda-memcheck -–tool racecheck <prog>
  • 23. The Importance of OpenMP • Programming models are changing to accommodate changes in system architectures • Higher degree of on-node parallelism: many-core CPUs and/or GPUs • Hybrid programming models: MPI+X, where OpenMP is an important X – MPI across the nodes – OpenMP shared memory parallelism across the cores in a node • Why use OpenMP? – The most widely used standard for SMP systems, implemented by many vendors – Supports the Fortran, C, and C++ languages – Relatively small and simple specification, and supports incremental parallelism – OpenMP research keeps it up to date with the latest hardware developments – OpenMP 4 allows targeting GPUs • We see momentum building around OpenMP © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 24. OpenMP Debugging Challenges • Programmers will attempt to exploit MPI+OpenMP hybrid parallelism • Porting existing large applications from MPI to a hybrid model is nontrivial and arduous, and having GPUs in the mix makes it even more challenging • Programming errors such as memory corruption, logic errors and concurrency bugs are inevitable • Bottom line – MPI+OpenMP+GPUs will present programmers with unprecedented debugging challenges – They need good debugging tools for MPI+OpenMP+GPUs © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 25. The following are some features that TotalView supports: • Source-level debugging of the original OpenMP code. • The ability to plant breakpoints throughout the OpenMP code, including lines that are executed in parallel. • Visibility of OpenMP worker threads. • Access to SHARED and PRIVATE variables in OpenMP PARALLEL code. • Access to OMP THREADPRIVATE data in code compiled by supported compilers. Debugging OpenMP Applications © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 26. Sample OpenMP Debugging Session OpenMP worker threads Local variables © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 27. OpenMP code high and low level • Intention is expressed in the OpenMP code – Serial-correct code with OMP directives expressing parallelism – Higher level expression of the ideas • Compiler can create either serial or parallel executable programs from this source • A parallel executable includes both the program logic and a runtime – Teams of threads on the device, the host or both – Outlined routines – Runtime calls to dispatch work to worker threads – Work created on thread A may be executed on threads M-N © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 28. What’s Needed? • Debugging and performance analysis support from OpenMP implementations • “OMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging” technical report (TR) – First TR combined OMPT and OMPD in one document – OMPD was redacted to allow OMPT to progress • “TR2: OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis” – Accepted by the OpenMP ARB (March 2014) – OMPT is an API for first-party performance tools • It is now time to circle back and finish OMPD! – OMPD is similar to OMPT in its functionality, but… – OMPD is an API for third-party debugging tools © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 29. What Does OMPT/OMPD Do? • OMPT – Enable performance tools to gather execution program/runtime costs – Allow construction of low-overhead performance tools – Allow logical stack unwinding (to handle outlined parallel regions) – Provide the state of a thread at any point in time (e.g., idle, work, wait) – Asynchronous signal safe • OMPD – Enable debugging tools to inspect the state of a live process or core file – Third-party versions of the OpenMP runtime inquiry functions – Third-party versions of the OMPT inquiry functions – Intercept the beginning/end of parallel/task regions (e.g., stepping in/out) – Enable the debugger to construct a “global view” of the process © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 30. How is OMPD Structured? • Based on a commonly used idiom – pthread thread_db, MPI MQD, MPI Handles, and others – The OMPD DLL is “paired” with the OpenMP runtime library • The debugger – Attaches to the target OpenMP application – Loads the OMPD DLL (e.g., via dlopen()) – Registers callbacks in the OMPD DLL – Makes “requests” into the OMPD DLL to query runtime state • The OMPD DLL – Makes callbacks into the debugger (lookup symbols, read/write memory, etc.) – Returns the result to the debugger © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 31. OMPD DLL OMPD DLL loaded into debugger and callbacks registered OMPD “In Action” Application Process OpenMP Runtime Library (RTL) Application address space Attach Debugger Debugger address space Request OpenMP state 1 • Handles for threads, parallel regions, tasks • Parent / child relationships • State of handles (wait, work, idle) 1 Request types Request symbols and address information 2 • Lookup symbols in the target process • Read/write target address spaces • Support for GPUs 2 Callback functions Callback ops 3Result © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 32. OMPD Status • Collaboration between LLNL and Rogue Wave Software (TotalView) – LLNL: Dong Ahn, Ignacio Laguna, Joachim Protze, Martin Schulz – RWS: Ariel Burton, John DelSignore • Resurrect OMPD with the ultimate goal of having it accepted by the ARB – Fix the current OMPD spec – Implement OMPD DLL in the Intel OpenMP runtime – Implement OMPD-based features in TotalView • IWOMP Paper (International Workshop on OpenMP) © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 34. TotalView 8.15 New Features • Scalable Infrastructure • Faster start up on Linux • Scales to O(100,000) processes & O(1,000,000) threads • Updated CUDA support • CUDA 7.0 • Support updates including: • Clang 3.5 • Intel 15.0 • MPT 2.12 • SLES 12, Fedora 21 TV Client MRNet CP MRNet CP TV Server TV Server TV Server TV Server © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 35. TotalView’s Scalability Strategy Multicast Reduction TotalView uses an “MRNet tree” of servers TV Client MRNet CP MRNet CP TV Server TV Server TV Server TV Server Remain lightweight in the backend! Smarts Smarts Push debugger “smarts” to the backend, not the whole debugger! Use classic optimization techniques too: caching, hoisting invariants, etc. © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 36. Linux Start up Performance in TV 8.15.4 5x faster (600s / 120s) at 16k between 8.14.1 and 8.15.4. Note that we switched to mrnet by default in 8.15.0 © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 37. BG Start up Performance in TV 8.15.4 6.4x faster (180s / 28s) at 16k between 8.14.1 and 8.15.4. Note that we switched to mrnet by default in 8.15.0 © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 38. TotalView debugs 786,432 cores. Climb with Rogue Wave towards exacale.
  • 39. Some more details on the 786,432 core test • The test was performed on 48 racks of Sequoia • The test code – Implements a Jacobi Linear Equation Solver – The test code is a hybrid MPI + OpenMP code – 16 threads per process, one process per node • The test operations – Start up – Setting breakpoints / removing breakpoints – Single stepping all threads • Tests performed at a variety of scales to understand scalability © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 40. TotalView’s Memory Efficiency 40 • TotalView is lightweight in the back-end (server) • Servers don’t “steal” memory from the application • Each server is a multi-process debugger agent – One server can debug thousands of processes – Not a conglomeration of single process debuggers – TotalView’s architecture provides flexibility (e.g., P/SVR) – No artificial limits to accommodate the debugger (e.g., BG/Q 1 P/CN) • Symbols are read, stored, and shared in the front-end (client) • Example: LLNL APP ADB, 920 shlibs, Linux, 64 P, 4 CN, 16 P/CN, 1 SVR/CN Process VSZ (largest, MB) RSS (largest, MB) Where TV Client 4,469 3,998 Front End ONLY MRNet CP 497 4 Compute Nodes TV Server 304 53 Compute Nodes © 2015 Rogue Wave Software, Inc. All Rights Reserved
  • 41. Future plans • Contact sales@roguewave.com with any inquires about our future plans with regard to TotalView product.
  • 43. Thanks! • Visit the website – http://www.roguewave.com/products/totalview.aspx – Documentation – Sign up for an evaluation – Contact customer support & post on the user forum © 2015 Rogue Wave Software, Inc. All Rights Reserved

Notes de l'éditeur

  1. ARB = Architecture Review Board