SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
PERFORMANCETOOLS DEVELOPMENTS
Roberto A.Vitillo
presented by Paolo Calafiura & Wim Lavrijsen
Lawrence Berkeley National Laboratory
Future computing in particle physics, 16 June 2011
1
LINUX PERFORMANCE EVENTS SUBSYSTEM
• The perf events subsystem was merged into the Linux kernel in version
2.6.31 and introduced the sys_perf_event_open system call
• Uses special purpose registers on the CPU to count the number of
“events”
• An HW event can be, for example, the number of cache miss suffered or
mispredicted branches
• SW events, like page misses, are also supported
• Performance counters are accessed via file descriptors using the above
mentioned system call
2
LINUX PERFORMANCE EVENTS SUBSYSTEM (2)
• perf is an user space utility that is part of the kernel repository
• Available in Scientific Linux 6
• Basic usage: data is collected by using the perf-record tool and
displayed with perf-report
3
THE PERFTOOL: EXAMPLE USAGE
4
WHY DO WE CARE?
• The Linux Performance Events Subsystem provides a low
overhead way to measure the workloads of a single
application or the full system
• It’s at least an order of magnitude faster than an instrumenting
profiler
• It provides far more information compared to statistical
profiler
5
WHAT IS MISSING
• Annotating the objdump output one event at a time is not
enough for efficiently finding bottlenecks
• A real GUI that can display multiple events and their relations
is missing
• New CPU’s have a buffer that records the last taken branches
but a support to exploit it is missing
6
PERF EVENTS CONVERTER
• As a first step a converter tool for the perf-tools data format
has been introduced
• The tool is capable to convert a perf data file to a callgrind
one that can be displayed with kcachegrind:
• multiple events are supported
• annotated source code, assembly and function list view
• complete inline chain
7
PERF EVENTS CONVERTER (2)
8
PERF EVENTSVISUALIZER
• KCachegrind doesn’t permit to show an arbitrary number of events at the same time
• A new converter and a web-based GUI is under development
• The converter reads the a raw perf data file and produces spreadsheets, cycle accounting
trees and call graphs
• The GUI will be able to:
• present the available data in spreadsheets, cycle accounting trees and callgraphs
• offer insights on the callgraph, e.g. mark as hot virtual methods with high call
counts
• correlate different HW/SW events to gain a deeper understanding of
the performance bottlenecks
9
LAST BRANCH RECORD SUPPORT
• New Intel processors have a cyclic buffer that can record taken branches
• Each recorded branch is composed of a pair of registers for source and
destination
• Last Branch Records (LBR) sampling can be used to, e.g.
• evaluate the frequency of function calls and perform inline decisions
• yield the partial path of an event
• building a partial callgraph
10
IMPORTANCE OF LBR
• Atlas Software Issues:
• low instruction retired / call retired ratio
• high call retired / branch retired ratio
• Inlining functions called millions of times per event can indeed bring
considerable benefits
• David Levinthal’s proposal:
‣ “Use LBR and static analysis to evaluate frequency and cost of
function calls”
‣ “Use social network analysis / network theory to identify
clusters of active, costly function call activity”
‣ “Order cluster by total cost and inline”
11
LBR DEVELOPMENTS
• Kernel patch for filtering and dumping of the LBR is
completed; After validation the patch will be integrated in the
kernel trunk
• The perf report user space utility has a new feature to display
statistics about the taken branches
12
EXPLOITINGTHE LBR IN PERF
• Statistics about
DSO to DSO and
Symbol to Symbol
supported
• Optionally
distinguish between
predicted and
mispredicted
branches
• Filtering support
13
TODO
• Use a recursive disassembler instead of
a linear one?
• Disassemble a module/function on the
fly?
• Improve basic block counts by:
• using LBR to generate software
instruction retired event
• adhering to flow conservation rules
while limiting the amount of changes
to sample counts to a minimum
B1 B2
B3
In general with sampling
#B1 + #B2 != #B3
3 4
1
14
CONCLUSIONS
• The callgrind converter and the new GUI under development
will offer an easy way to non experts to navigate and
understand the profiled application
• The LBR support adds important profiling possibilities, vital for
OO SW, to the Linux Performance Events Subsystem
15

Contenu connexe

Tendances

Distributed monitoring
Distributed monitoringDistributed monitoring
Distributed monitoring
Leon Torres
 
WINLAB Poster Final
WINLAB Poster FinalWINLAB Poster Final
WINLAB Poster Final
Parth Parikh
 
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCDNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
Grid Protection Alliance
 

Tendances (20)

Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
 
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
 
GSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning softwareGSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning software
 
Autonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafkaAutonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafka
 
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike KlusikOSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
 
Slick: A control plane for middleboxes
Slick: A control plane for middleboxesSlick: A control plane for middleboxes
Slick: A control plane for middleboxes
 
ManageEngine OpUtils Technical Overview
ManageEngine OpUtils Technical OverviewManageEngine OpUtils Technical Overview
ManageEngine OpUtils Technical Overview
 
Implementation of WaterCoach SeqFEWS
Implementation of WaterCoach SeqFEWS Implementation of WaterCoach SeqFEWS
Implementation of WaterCoach SeqFEWS
 
3 the cloud
3 the cloud 3 the cloud
3 the cloud
 
Distributed monitoring
Distributed monitoringDistributed monitoring
Distributed monitoring
 
Monitoring Splunk: S.o.S, DMC, and Beyond
Monitoring Splunk: S.o.S, DMC, and BeyondMonitoring Splunk: S.o.S, DMC, and Beyond
Monitoring Splunk: S.o.S, DMC, and Beyond
 
Apache Apex Kafka Input Operator
Apache Apex Kafka Input OperatorApache Apex Kafka Input Operator
Apache Apex Kafka Input Operator
 
WINLAB Poster Final
WINLAB Poster FinalWINLAB Poster Final
WINLAB Poster Final
 
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCDNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
 
TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15TechChat - What’s New in Sumo Logic 7/21/15
TechChat - What’s New in Sumo Logic 7/21/15
 
Monitoring Splunk: S.o.S, DMC, and Beyond Breakout Session
Monitoring Splunk: S.o.S, DMC, and Beyond Breakout SessionMonitoring Splunk: S.o.S, DMC, and Beyond Breakout Session
Monitoring Splunk: S.o.S, DMC, and Beyond Breakout Session
 
Apache Apex connector with Kafka 0.9 consumer API
Apache Apex connector with Kafka 0.9 consumer APIApache Apex connector with Kafka 0.9 consumer API
Apache Apex connector with Kafka 0.9 consumer API
 

Similaire à Performance tools developments

SFSCON23 - Andrea Alfonsi - Kubernetes for IoT
SFSCON23 - Andrea Alfonsi - Kubernetes for IoTSFSCON23 - Andrea Alfonsi - Kubernetes for IoT
SFSCON23 - Andrea Alfonsi - Kubernetes for IoT
South Tyrol Free Software Conference
 

Similaire à Performance tools developments (20)

Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
SFSCON23 - Andrea Alfonsi - Kubernetes for IoT
SFSCON23 - Andrea Alfonsi - Kubernetes for IoTSFSCON23 - Andrea Alfonsi - Kubernetes for IoT
SFSCON23 - Andrea Alfonsi - Kubernetes for IoT
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
GOoDA tutorial
GOoDA tutorialGOoDA tutorial
GOoDA tutorial
 
10 implementing subprograms
10 implementing subprograms10 implementing subprograms
10 implementing subprograms
 
Embedded-Linux-Community-Update-2022-02-JJ78.pdf
Embedded-Linux-Community-Update-2022-02-JJ78.pdfEmbedded-Linux-Community-Update-2022-02-JJ78.pdf
Embedded-Linux-Community-Update-2022-02-JJ78.pdf
 
Composing services with Kubernetes
Composing services with KubernetesComposing services with Kubernetes
Composing services with Kubernetes
 
8 ert
8 ert8 ert
8 ert
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Ebpf ovsconf-2016
Ebpf ovsconf-2016Ebpf ovsconf-2016
Ebpf ovsconf-2016
 
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016OpenDaylight Openflow & OVSDB use cases ODL summit 2016
OpenDaylight Openflow & OVSDB use cases ODL summit 2016
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
 

Plus de Roberto Agostino Vitillo (12)

Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
 
Growing a Data Pipeline for Analytics
Growing a Data Pipeline for AnalyticsGrowing a Data Pipeline for Analytics
Growing a Data Pipeline for Analytics
 
Telemetry Datasets
Telemetry DatasetsTelemetry Datasets
Telemetry Datasets
 
Growing a SQL Query
Growing a SQL QueryGrowing a SQL Query
Growing a SQL Query
 
Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
 
All you need to know about Statistics
All you need to know about StatisticsAll you need to know about Statistics
All you need to know about Statistics
 
Spark meets Telemetry
Spark meets TelemetrySpark meets Telemetry
Spark meets Telemetry
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to know
 
Sharing C++ objects in Linux
Sharing C++ objects in LinuxSharing C++ objects in Linux
Sharing C++ objects in Linux
 
Exploiting vectorization with ISPC
Exploiting vectorization with ISPCExploiting vectorization with ISPC
Exploiting vectorization with ISPC
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
Inter-process communication on steroids
Inter-process communication on steroidsInter-process communication on steroids
Inter-process communication on steroids
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Performance tools developments

  • 1. PERFORMANCETOOLS DEVELOPMENTS Roberto A.Vitillo presented by Paolo Calafiura & Wim Lavrijsen Lawrence Berkeley National Laboratory Future computing in particle physics, 16 June 2011 1
  • 2. LINUX PERFORMANCE EVENTS SUBSYSTEM • The perf events subsystem was merged into the Linux kernel in version 2.6.31 and introduced the sys_perf_event_open system call • Uses special purpose registers on the CPU to count the number of “events” • An HW event can be, for example, the number of cache miss suffered or mispredicted branches • SW events, like page misses, are also supported • Performance counters are accessed via file descriptors using the above mentioned system call 2
  • 3. LINUX PERFORMANCE EVENTS SUBSYSTEM (2) • perf is an user space utility that is part of the kernel repository • Available in Scientific Linux 6 • Basic usage: data is collected by using the perf-record tool and displayed with perf-report 3
  • 5. WHY DO WE CARE? • The Linux Performance Events Subsystem provides a low overhead way to measure the workloads of a single application or the full system • It’s at least an order of magnitude faster than an instrumenting profiler • It provides far more information compared to statistical profiler 5
  • 6. WHAT IS MISSING • Annotating the objdump output one event at a time is not enough for efficiently finding bottlenecks • A real GUI that can display multiple events and their relations is missing • New CPU’s have a buffer that records the last taken branches but a support to exploit it is missing 6
  • 7. PERF EVENTS CONVERTER • As a first step a converter tool for the perf-tools data format has been introduced • The tool is capable to convert a perf data file to a callgrind one that can be displayed with kcachegrind: • multiple events are supported • annotated source code, assembly and function list view • complete inline chain 7
  • 9. PERF EVENTSVISUALIZER • KCachegrind doesn’t permit to show an arbitrary number of events at the same time • A new converter and a web-based GUI is under development • The converter reads the a raw perf data file and produces spreadsheets, cycle accounting trees and call graphs • The GUI will be able to: • present the available data in spreadsheets, cycle accounting trees and callgraphs • offer insights on the callgraph, e.g. mark as hot virtual methods with high call counts • correlate different HW/SW events to gain a deeper understanding of the performance bottlenecks 9
  • 10. LAST BRANCH RECORD SUPPORT • New Intel processors have a cyclic buffer that can record taken branches • Each recorded branch is composed of a pair of registers for source and destination • Last Branch Records (LBR) sampling can be used to, e.g. • evaluate the frequency of function calls and perform inline decisions • yield the partial path of an event • building a partial callgraph 10
  • 11. IMPORTANCE OF LBR • Atlas Software Issues: • low instruction retired / call retired ratio • high call retired / branch retired ratio • Inlining functions called millions of times per event can indeed bring considerable benefits • David Levinthal’s proposal: ‣ “Use LBR and static analysis to evaluate frequency and cost of function calls” ‣ “Use social network analysis / network theory to identify clusters of active, costly function call activity” ‣ “Order cluster by total cost and inline” 11
  • 12. LBR DEVELOPMENTS • Kernel patch for filtering and dumping of the LBR is completed; After validation the patch will be integrated in the kernel trunk • The perf report user space utility has a new feature to display statistics about the taken branches 12
  • 13. EXPLOITINGTHE LBR IN PERF • Statistics about DSO to DSO and Symbol to Symbol supported • Optionally distinguish between predicted and mispredicted branches • Filtering support 13
  • 14. TODO • Use a recursive disassembler instead of a linear one? • Disassemble a module/function on the fly? • Improve basic block counts by: • using LBR to generate software instruction retired event • adhering to flow conservation rules while limiting the amount of changes to sample counts to a minimum B1 B2 B3 In general with sampling #B1 + #B2 != #B3 3 4 1 14
  • 15. CONCLUSIONS • The callgrind converter and the new GUI under development will offer an easy way to non experts to navigate and understand the profiled application • The LBR support adds important profiling possibilities, vital for OO SW, to the Linux Performance Events Subsystem 15