SlideShare a Scribd company logo
1 of 25
Motivation and Challenges
Investigating Tail Latency is About Probing for the
Atypical
■ Execution samples that land in the tail have been slowed down for some reason
that differs from the average case: either—
● They differ in amount (or type of) work performed, or
● They were affected by some uncommon event or interference
● They encountered more waiting (experienced resource starvation longer than average
cases).
■ In particular, it is not generally true that tail latency samples and remaining
execution samples exhibit comparable software, hardware, or scheduling
histories.
■ Standard approaches for exposing throughput limiters do not generally expose
causes of peak latency
Illustrative Scenario: 1
Illustrative Scenario: 1
SLA
violation
Illustrative Scenario: 1
SLA
violation
SLA
violation
Illustrative Scenario: 1 -- Why ?
SLA
violation
Default sleep
state (C9)
Min sleep
state (C1)
■ Default power setting results in latency SLA violation at ½ the throughput
■ Ironically: low utilization results in earlier onset of tail latency because of
power-management interference (not in application’s control).
SLA
violation
Illustrative Scenario: 2
■ Reducing latency to a minimum 🡪 frequently entails choosing between
conflicting options for scheduling resources like cpu time.
● Reduce overheads vs.
● Preempt sooner vs.
● Prioritize critical sections vs.
● Do not thrash the cache vs.
● Be fair at a fine-grained scale: do not starve tasks (or activities like garbage
collection) . . . etc.
Illustrative Scenario: 2
■ Reducing latency to a minimum 🡪 frequently entails choosing between
conflicting options for scheduling resources like cpu time.
● Reduce overheads vs.
● Preempt sooner vs.
● Prioritize critical sections vs.
● Do not thrash the cache vs.
● Be fair at a fine-grained scale: do not starve tasks (or activities like garbage
collection) . . . etc.
■ Online video gaming at high scale:
● High FPS (frames per second).
● Hotness in L1, L2.
● Streamlined execution: do not preempt too soon!
● Deadline priorities: Schedule quickly upon waking up!
Illustrative Scenario: 2
■ Online video gaming at high scale:
● High FPS (frames per second).
● Hotness in L1, L2.
● Streamlined execution: do not preempt too soon!
● Deadline priorities: Schedule quickly upon waking up!
■ For a first order impact on reducing frame drops: make task migration immediate.
Tunable CFS default New value
kernel_sched_migration_cost_ns 500000 0
kernel_sched_min_granularity_ns 3000000 1000000
kernel_sched_wakeup_granularity_ns 4000000 0
Scheduler tuning for dramatically reducing frame drops
kernel/sched_fair.c
. . .
gran = sysctl_sched_wakeup_granularity;
. . .
// if virtual run time of current executing task exceeds that of wakeup target plus a margin
// of safety provided by gran, then the current task is forced to yield to the wakeup target.
if (pse->vruntime + gran < se->vruntime) //se is current executing task
resched_task(curr);
Vital to Understand Transitions at Sub-ms Grain
■ Small-time ranges contain the vital signals for analyzing tail latency growth.
● Sampling and averaging of CPU and platform software telemetry helps only a little, but
is too blunt to tie short-range causes to effects that persist.
● For example: the scheduling of a large I/O that may take a few microseconds but cause
lingering effects in shared caches.
■ Tuning of schedulers (as in Scenario 2) requires the ability to observe intra- and
inter- process effects more or less continuously.
● Interestingly, scheduler tuning can also free up CPU utilization, and create an
opportunity to support an ability to handle load spikes within a latency budget.
Solution Space and Role of Hardware
perf-sched: a Dump and Post-process Approach for
Capturing and Analyzing Scheduler Events
■ perf-sched record
■ perf-sched timehist
■ perf-sched map
■ …
https://www.brendangregg.com/blog/2017-03-16/perf-sched.html
From: perf sched for Linux CPU scheduler analysis,
by Brendan Gregg, 2017
Get to an understanding of -- How far is
max sched delay from average? When is
wait time blowing up - what
contemporaneous activities are in play
when max delays occur? Do they show
similar effects, or is their execution itself
something unusual?
perf-sched timehist to Understand and Tune Various
Scheduling Parameters
https://www.brendangregg.com/blog/2017-03-16/perf-sched.html
From: perf sched for Linux CPU scheduler analysis, by Brendan Gregg, 2017
KUtrace by Richard Sites
■ All Kernel-user and User-kernel transitions
collected at very low space, time overhead.
■ Doing one thing, and that one thing well, for
production coverage
■ Further, captures instruction counter from
the PMU at transition points, to understand
IPC – all at ~40ns overhead
From: KUTrace: Where have all the nanoseconds gone?, Richard
Sites, Tracing Summit, 2017.
Hardware Role in Monitoring
■ Modern CPUs build in a significant amount of ability to monitor many events at
very low overhead.
■ PMU registers get multiplexed over event space (under software control) at a
moderate granularity.
■ A very large subset of these events can provide for event-based sampling, so that
correlated ratios can be collected and time-aligned -- for dissecting into likely
causes of IPC shifts.
https://perfmon-events.intel.com/
PMU Based Extraction of Long-latency Paths
■ Timed LBRs (last branch records)
PMU Based Extraction of Long-latency Paths
■ A more detailed view
(different example case)
Identifying Cache Miss Hotspots (Data, Code)
• Similarly, data heatmap can be generated without requiring software instrumentation and
tracing (e.g., with valgrind, Pin, etc.)
MEM_TRANS_RETIRED.LOAD_LATENCY_GT_<Binary Number>
From Sampling to Tracing
■ Profiling tools today (e.g., perf, Intel® VTune, etc.) are mostly based on the idea of statistical
hotspot collection: sampling and averaging – and therefore lose short interval transitions.
■ Tracing today (e.g., with insertion of tracepoints) requires a software developer to anticipate
where to instrument code. This is not generally easy, unless a lot of engineering has already
gone into preselecting (e.g, KUTrace).
● Instrumenting everything or over-collecting results incurs too much CPU penalty
● And memory and cache pollution
■ Challenges beyond trace collection:
● Much effort and data pruning before tail latencies can be linked to likely causes
● Usually pushed to offline analysis.
■ Understanding (and remediating) tail latencies itself needs to be a low latency endeavor.
eBPF
■ eBPF provides for programmable triggering and conditional collection
● at low overhead
● user or kernel
■ Thus, for example, one can do something like this:
Using the longest-latency access
event based sampling
as a trigger
Snapshot the timed LBR buffer
by using eBPF
eBPF and KUTrace/perf-sched and . . .
■ KUtrace is intentionally austere
● So it can be deployed in production, at scale, and be available and running continuously.
● (Adding bells and whistles to it– not a good idea, discouraged!).
+ eBPF’s In kernel filtering
■ For deep insights: KUTrace will provide all
user<->kernel transitions
eBPF and KUTrace/perf-sched and . . .
■ KUtrace is intentionally austere
● So it can be deployed in production, at scale, and be available and running continuously.
● (Adding bells and whistles to it– not a good idea, discouraged!).
+ eBPF’s In kernel filtering
■ For deep insights: KUTrace will provide all
user<->kernel transitions
■ With eBPF controlled tracing we could
invoke traces only in areas of interest. E.g.
trace all grpc requests with packet size >
64K (maybe that’s only when you see high
tail latencies)
■ (Or start first with eBPF and perf-sched latency co-monitoring and triggering)
● (While connecting eBPF based probes for latency monitoring in select higher
stack layers)
Summing It Up
■ Tail latency control is crucial, with penetration of real-time complex event processing in
virtually all sectors.
■ Low overhead and agile monitoring of latency excursions is needed.
■ Equally, to unveil the causes, contributing factors need to be collected at low overhead and
in a timely manner – ideally, through conditional collection and filtering.
■ Hardware performance monitoring capabilities are rich and can collect a rich variety of
events at very low overhead.
■ Linking eBPF based latency-focused monitoring (e.g., timed LBRs and long latency cache
misses) is one direction.
■ Another is triggering eBPF based hardware event rates collection, time-aligned with
scheduler events filtered for high scheduling delays (wait-signaling and post-wait
dispatching).
Brought to you by
Thank You

More Related Content

What's hot

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyScyllaDB
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...ScyllaDB
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetJames Wernicke
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018javier ramirez
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernelVadim Nikitin
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Keeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssemblyKeeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssemblyScyllaDB
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Tutorial ceph-2
Tutorial ceph-2Tutorial ceph-2
Tutorial ceph-2Tommy Lee
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentDataWorks Summit
 
Measuring P99 Latency in Event-Driven Architectures with OpenTelemetry
Measuring P99 Latency in Event-Driven Architectures with OpenTelemetryMeasuring P99 Latency in Event-Driven Architectures with OpenTelemetry
Measuring P99 Latency in Event-Driven Architectures with OpenTelemetryScyllaDB
 
Cassandra Performance Tuning Like You've Been Doing It for Ten Years
Cassandra Performance Tuning Like You've Been Doing It for Ten YearsCassandra Performance Tuning Like You've Been Doing It for Ten Years
Cassandra Performance Tuning Like You've Been Doing It for Ten YearsJon Haddad
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
 

What's hot (20)

Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 Latency
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over Ethernet
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Keeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssemblyKeeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssembly
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Tutorial ceph-2
Tutorial ceph-2Tutorial ceph-2
Tutorial ceph-2
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
Measuring P99 Latency in Event-Driven Architectures with OpenTelemetry
Measuring P99 Latency in Event-Driven Architectures with OpenTelemetryMeasuring P99 Latency in Event-Driven Architectures with OpenTelemetry
Measuring P99 Latency in Event-Driven Architectures with OpenTelemetry
 
Cassandra Performance Tuning Like You've Been Doing It for Ten Years
Cassandra Performance Tuning Like You've Been Doing It for Ten YearsCassandra Performance Tuning Like You've Been Doing It for Ten Years
Cassandra Performance Tuning Like You've Been Doing It for Ten Years
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
 

Similar to Hardware Assisted Latency Investigations

Module 3-cpu-scheduling
Module 3-cpu-schedulingModule 3-cpu-scheduling
Module 3-cpu-schedulingHesham Elmasry
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsScyllaDB
 
Galvin-operating System(Ch6)
Galvin-operating System(Ch6)Galvin-operating System(Ch6)
Galvin-operating System(Ch6)dsuyal1
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
 
BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction Linaro
 
Cpu scheduling final
Cpu scheduling finalCpu scheduling final
Cpu scheduling finalmarangburu42
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh
 
Cpu scheduling pre final formatting
Cpu scheduling pre final formattingCpu scheduling pre final formatting
Cpu scheduling pre final formattingmarangburu42
 
ch5_EN_CPUSched_2022.pdf
ch5_EN_CPUSched_2022.pdfch5_EN_CPUSched_2022.pdf
ch5_EN_CPUSched_2022.pdfCuracaoJTR
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the wayMark Price
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
 
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on LinuxTommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linuxlinuxlab_conf
 

Similar to Hardware Assisted Latency Investigations (20)

Module 3-cpu-scheduling
Module 3-cpu-schedulingModule 3-cpu-scheduling
Module 3-cpu-scheduling
 
Optimizing Linux Servers
Optimizing Linux ServersOptimizing Linux Servers
Optimizing Linux Servers
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
 
Galvin-operating System(Ch6)
Galvin-operating System(Ch6)Galvin-operating System(Ch6)
Galvin-operating System(Ch6)
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
ch5_EN_CPUSched.pdf
ch5_EN_CPUSched.pdfch5_EN_CPUSched.pdf
ch5_EN_CPUSched.pdf
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
 
Cpu scheduling final
Cpu scheduling finalCpu scheduling final
Cpu scheduling final
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)
 
Cpu scheduling pre final formatting
Cpu scheduling pre final formattingCpu scheduling pre final formatting
Cpu scheduling pre final formatting
 
Cpu scheduling
Cpu schedulingCpu scheduling
Cpu scheduling
 
ch5_EN_CPUSched_2022.pdf
ch5_EN_CPUSched_2022.pdfch5_EN_CPUSched_2022.pdf
ch5_EN_CPUSched_2022.pdf
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
Os2
Os2Os2
Os2
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on LinuxTommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
Tommaso Cucinotta - Low-latency and power-efficient audio applications on Linux
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesScyllaDB
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling Mistakes
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Hardware Assisted Latency Investigations

  • 1.
  • 3. Investigating Tail Latency is About Probing for the Atypical ■ Execution samples that land in the tail have been slowed down for some reason that differs from the average case: either— ● They differ in amount (or type of) work performed, or ● They were affected by some uncommon event or interference ● They encountered more waiting (experienced resource starvation longer than average cases). ■ In particular, it is not generally true that tail latency samples and remaining execution samples exhibit comparable software, hardware, or scheduling histories. ■ Standard approaches for exposing throughput limiters do not generally expose causes of peak latency
  • 7. Illustrative Scenario: 1 -- Why ? SLA violation Default sleep state (C9) Min sleep state (C1) ■ Default power setting results in latency SLA violation at ½ the throughput ■ Ironically: low utilization results in earlier onset of tail latency because of power-management interference (not in application’s control). SLA violation
  • 8. Illustrative Scenario: 2 ■ Reducing latency to a minimum 🡪 frequently entails choosing between conflicting options for scheduling resources like cpu time. ● Reduce overheads vs. ● Preempt sooner vs. ● Prioritize critical sections vs. ● Do not thrash the cache vs. ● Be fair at a fine-grained scale: do not starve tasks (or activities like garbage collection) . . . etc.
  • 9. Illustrative Scenario: 2 ■ Reducing latency to a minimum 🡪 frequently entails choosing between conflicting options for scheduling resources like cpu time. ● Reduce overheads vs. ● Preempt sooner vs. ● Prioritize critical sections vs. ● Do not thrash the cache vs. ● Be fair at a fine-grained scale: do not starve tasks (or activities like garbage collection) . . . etc. ■ Online video gaming at high scale: ● High FPS (frames per second). ● Hotness in L1, L2. ● Streamlined execution: do not preempt too soon! ● Deadline priorities: Schedule quickly upon waking up!
  • 10. Illustrative Scenario: 2 ■ Online video gaming at high scale: ● High FPS (frames per second). ● Hotness in L1, L2. ● Streamlined execution: do not preempt too soon! ● Deadline priorities: Schedule quickly upon waking up! ■ For a first order impact on reducing frame drops: make task migration immediate. Tunable CFS default New value kernel_sched_migration_cost_ns 500000 0 kernel_sched_min_granularity_ns 3000000 1000000 kernel_sched_wakeup_granularity_ns 4000000 0 Scheduler tuning for dramatically reducing frame drops kernel/sched_fair.c . . . gran = sysctl_sched_wakeup_granularity; . . . // if virtual run time of current executing task exceeds that of wakeup target plus a margin // of safety provided by gran, then the current task is forced to yield to the wakeup target. if (pse->vruntime + gran < se->vruntime) //se is current executing task resched_task(curr);
  • 11. Vital to Understand Transitions at Sub-ms Grain ■ Small-time ranges contain the vital signals for analyzing tail latency growth. ● Sampling and averaging of CPU and platform software telemetry helps only a little, but is too blunt to tie short-range causes to effects that persist. ● For example: the scheduling of a large I/O that may take a few microseconds but cause lingering effects in shared caches. ■ Tuning of schedulers (as in Scenario 2) requires the ability to observe intra- and inter- process effects more or less continuously. ● Interestingly, scheduler tuning can also free up CPU utilization, and create an opportunity to support an ability to handle load spikes within a latency budget.
  • 12. Solution Space and Role of Hardware
  • 13. perf-sched: a Dump and Post-process Approach for Capturing and Analyzing Scheduler Events ■ perf-sched record ■ perf-sched timehist ■ perf-sched map ■ … https://www.brendangregg.com/blog/2017-03-16/perf-sched.html From: perf sched for Linux CPU scheduler analysis, by Brendan Gregg, 2017 Get to an understanding of -- How far is max sched delay from average? When is wait time blowing up - what contemporaneous activities are in play when max delays occur? Do they show similar effects, or is their execution itself something unusual?
  • 14. perf-sched timehist to Understand and Tune Various Scheduling Parameters https://www.brendangregg.com/blog/2017-03-16/perf-sched.html From: perf sched for Linux CPU scheduler analysis, by Brendan Gregg, 2017
  • 15. KUtrace by Richard Sites ■ All Kernel-user and User-kernel transitions collected at very low space, time overhead. ■ Doing one thing, and that one thing well, for production coverage ■ Further, captures instruction counter from the PMU at transition points, to understand IPC – all at ~40ns overhead From: KUTrace: Where have all the nanoseconds gone?, Richard Sites, Tracing Summit, 2017.
  • 16. Hardware Role in Monitoring ■ Modern CPUs build in a significant amount of ability to monitor many events at very low overhead. ■ PMU registers get multiplexed over event space (under software control) at a moderate granularity. ■ A very large subset of these events can provide for event-based sampling, so that correlated ratios can be collected and time-aligned -- for dissecting into likely causes of IPC shifts. https://perfmon-events.intel.com/
  • 17. PMU Based Extraction of Long-latency Paths ■ Timed LBRs (last branch records)
  • 18. PMU Based Extraction of Long-latency Paths ■ A more detailed view (different example case)
  • 19. Identifying Cache Miss Hotspots (Data, Code) • Similarly, data heatmap can be generated without requiring software instrumentation and tracing (e.g., with valgrind, Pin, etc.) MEM_TRANS_RETIRED.LOAD_LATENCY_GT_<Binary Number>
  • 20. From Sampling to Tracing ■ Profiling tools today (e.g., perf, Intel® VTune, etc.) are mostly based on the idea of statistical hotspot collection: sampling and averaging – and therefore lose short interval transitions. ■ Tracing today (e.g., with insertion of tracepoints) requires a software developer to anticipate where to instrument code. This is not generally easy, unless a lot of engineering has already gone into preselecting (e.g, KUTrace). ● Instrumenting everything or over-collecting results incurs too much CPU penalty ● And memory and cache pollution ■ Challenges beyond trace collection: ● Much effort and data pruning before tail latencies can be linked to likely causes ● Usually pushed to offline analysis. ■ Understanding (and remediating) tail latencies itself needs to be a low latency endeavor.
  • 21. eBPF ■ eBPF provides for programmable triggering and conditional collection ● at low overhead ● user or kernel ■ Thus, for example, one can do something like this: Using the longest-latency access event based sampling as a trigger Snapshot the timed LBR buffer by using eBPF
  • 22. eBPF and KUTrace/perf-sched and . . . ■ KUtrace is intentionally austere ● So it can be deployed in production, at scale, and be available and running continuously. ● (Adding bells and whistles to it– not a good idea, discouraged!). + eBPF’s In kernel filtering ■ For deep insights: KUTrace will provide all user<->kernel transitions
  • 23. eBPF and KUTrace/perf-sched and . . . ■ KUtrace is intentionally austere ● So it can be deployed in production, at scale, and be available and running continuously. ● (Adding bells and whistles to it– not a good idea, discouraged!). + eBPF’s In kernel filtering ■ For deep insights: KUTrace will provide all user<->kernel transitions ■ With eBPF controlled tracing we could invoke traces only in areas of interest. E.g. trace all grpc requests with packet size > 64K (maybe that’s only when you see high tail latencies) ■ (Or start first with eBPF and perf-sched latency co-monitoring and triggering) ● (While connecting eBPF based probes for latency monitoring in select higher stack layers)
  • 24. Summing It Up ■ Tail latency control is crucial, with penetration of real-time complex event processing in virtually all sectors. ■ Low overhead and agile monitoring of latency excursions is needed. ■ Equally, to unveil the causes, contributing factors need to be collected at low overhead and in a timely manner – ideally, through conditional collection and filtering. ■ Hardware performance monitoring capabilities are rich and can collect a rich variety of events at very low overhead. ■ Linking eBPF based latency-focused monitoring (e.g., timed LBRs and long latency cache misses) is one direction. ■ Another is triggering eBPF based hardware event rates collection, time-aligned with scheduler events filtered for high scheduling delays (wait-signaling and post-wait dispatching).
  • 25. Brought to you by Thank You