Profiling Multicore Systems to Maximize Core Utilization – Colin Walls
Underutilization of cores in a multicore system can be considered a bug. As your system incorporates more cores, you need to make sure that all the cores are being utilized fully. Un-expected inter-actions between processes, the operating system, and resources can prevent cores from delivering peak performance. In this session explore how to profile what each core is doing, which processes are running on each core, and understand where core utilization falls below optimum values.
Human Factors of XR: Using Human Factors to Design XR Systems
Profiling Multicore Systems to Maximize Core Utilization
1. Profiling Multicore
Systems to
Maximize Core
Utilization
Colin Walls
colin_walls@mentor.com
mentor.com/embedded
2. Multicore Drives Complexity
Multicore &
Multiprocessor Use Open Source OS Use
70%
62.7% 65.1% 37.2%
32.2%
23.9%
0% Previous Project Current Project Next Two Years
Current Project Next Two Years
Almost two-thirds of all future
projects plan to use multi-core
or multi-processor devices!
*Source: VDC Research Group, STRATEGIC INSIGHTS 2012: EMBEDDED SOFTWARE & TOOLS MARKET, TRACK 2:
Embedded Software Engineering Technologies, VOLUME 3: Software Development & Multicore Tool.
3. Complexity Stresses Timing
Complex Different
Debugging Defects
Manual Debugging
Single Application
Manual Debugging
Multiple
!
Applications
Bare Metal
Complex OS
RTOS
Single Processor Multi - Processor
Single Core Multi - Core
4. Debuggers: Stop and Stare
Debuggers are
indispensable, but they only
show a snapshot.
From this photo, can you tell
if this building will be
completed on schedule?
– How long does it usually take
this worker?
– Would better tools help?
– Are other workers sitting idle?
Construction Worker by Rubber Dragon
5. Tracing, Instrumenting, Logging
Historically, tracing involved a hardware instrument
– Or on-chip logic
– Buffer size limited
– Completely non-intrusive
– Ideal in ISS
Instrumenting application code
– Adding custom code
– Maybe condition compile
– Debugging with printf()
Logging option with many RTOSes
5
6. Beyond Debuggers
Answering the higher-level questions require information
that traditional interactive debuggers lack:
– Tracing historical state
– Application awareness
Tracing can help find:
– race conditions
– latencies
– bugs that don't cause traps
– systems where stopping the world
isn't feasible
Photo by woodleywonderworks
... in both application and platform code
7. Trace Data Sources – Linux Trace Toolkit
Sourcery Analyzer focuses on
LTTng to record and collect trace
data on Linux.
– Mature, high-performance tracing
system for Linux Linux Trace Toolkit - next generation
– Can record both kernel and
userspace events
– Low overhead
HRB, Analyzer, Sep 2012 7
8. Sourcery Analyzer with LTTng Architecture
network
Sourcery Analyzer
host
Linux target
memory LTTng
flash Storag Consumer
disk
network FS e Daemon
C/C++
Linux Kernel
Application
8
9. LTTng 2.0 Attributes
Tracepoints Common Trace Format
• Low overhead • New compact binary
• No trap or system call format
required • Flexible data layout
• Suitable for use in • Network streamable
realtime systems • Size and seek optimized
• Inactive tracepoints have for very large trace files
negligible overhead
Deployment
• Loadable kernel module (2.6.38+)
• Companion target side daemons and libraries
11. Sourcery Analyzer - Not Just A Trace Viewer
Trace viewing tools Event List
depend on users to find
the patterns.
Sourcery Analyzer
focuses on analysis. Analysis Agents
Task-centric Analysis
Agents calculate and
display the higher-level
patterns.
12. Viewing Trace Data
Sourcery Analyzer inherited its engine from
Mentor's high-end hardware design tools.
– high-performance event database
– sophisticated measurement tools
– variety of visualization types
Visualize event payloads, not
just events.
Lamborghini Engine by Dr. Warner
13. Customizability is Important
Most developers are working on the
application, but most debugging tools
provide only platform awareness.
Sourcery where most
Analyzer work occurs
To compensate, developers often
cobble together in-house debugging customized
tools. Analysis application
Agents
In-house
Mentor Embedded Sourcery Tools
Analyzer provides platform
out-of-the-box operating system Stock
visibility and a rich platform for Analysis 3rd-party
user-developed analysis tools. Agents
hardware Tools
platform
14. Analysis Agents
• Out-of-box access to powerful
analysis routines
• Ships with library of 15 popular
agents
• One-click flow to automatically
generate pre-processed analysis
views
• Ability to also create and add
customized agents to the library
Software thread state Page fault rate Network activity
Scheduling Function call flow Thread migration rate
CPU utilization CPU state or add your own
IRQ rate Filesystem activity
14
16. Real World Example
Old Design New Design
RTOS, single-core Linux, multicore
maximum
maximum
~200 7000+
ms ms
average
average
~150 ~150
ms ms
minimum
minimum
~40 ~40
ms ms
16
17. Diagnosing Problems: Real-time Response
Common problem: a real-time deadline is occasionally, but
rarely, missed.
Approach:
– Instrument the start/stop measurement points (e.g. IRQ and application's “read”
function).
– Run the test workload.
– Use Sourcery Analyzer to highlight only the missed deadlines.
– Correlate those occurrences with other system activities on the timeline.
– If more detailed data is needed, add instrumentation and repeat.
User-specified budget OK Not OK
HRB, Analyzer, Sep 2012 17
18. Thank you
Colin Walls
colin_walls@mentor.com
http://blogs.mentor.com/colinwalls
mentor.com/embedded