SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
1
Research Update on
big.LITTLE MP Scheduling
Morten Rasmussen
Technology Researcher
2
Why is big.LITTLE different from SMP?
 SMP:
 Scheduling goal is to distribute work evenly across all available CPUs
to get maximum performance.
 If we have DVFS support we can even save power this way too.
 big.LITTLE:
 Scheduling goal is to maximize power efficiency with only a modest
performance sacrifice.
 Task should be distributed unevenly. Only critical tasks should
execute on big CPUs to minimize power consumption.
 Contrary to SMP, it matters where a task is scheduled.
3
 Example: Android UI render thread execution time.
What is the (mainline) status?
4 core SMP
2+2 big.LITTLE (emulated)
It matters where a task is scheduled.
4
 Example: Android UI render thread execution time.
What is the (mainline) status?
4 core SMP
2+2 big.LITTLE (emulated)
It matters where a task is scheduled.
big.LITTLE aware scheduling
5
Mainline Linux Scheduler
 Linux has two schedulers to handle the scheduling policies:
 RT: Real-time scheduler for very high priority tasks.
 CFS: Completely Fair Scheduler for anything else and is used for
almost all tasks.
 We need proper big.LITTLE/heterogeneous platform support
in CFS.
 Load-balancing is currently based on an expression of CPU load
which is basically:
 The scheduler does not know how much CPU time is consumed by
each task.
 The current scheduler can handle distributing task fairly evenly based
on cpu_power for big.LITTLE system, but this is not what we want for
power efficiency.
cpuload=cpupower⋅∑
task
priotask
6
Tracking task load
 The load contribution of a particular task is needed to make
an appropriate scheduling decision.
 We have experimented internally with identifying task
characteristics based on the tasks’ time slice utilization.
 Recently, Paul Turner (Google) posted a RFC patch set on
LKML with similar features.
 LKML: https://lkml.org/lkml/2012/2/1/763
7
Entity load-tracking summary
 Patch set for improving fair group scheduling, but adds some
essential bits that are very useful for big.LITTLE.
 Tracks the time each task spends on the runqueue (executing or
waiting) approximately every ms. Note that: trunqueue ≥ texecuting
 The contributed load is a geometric series over the history of time
spent on the runqueue scaled by the task priority.
Task load
Task state
Executing Sleep
Load decay
8
big.LITTLE scheduling: First stab
 Policy: Keep all tasks on little cores unless:
1. The task load (runqueue residency) is above a fixed threshold, and
2. The task priority is default or higher (nice ≤ 0)
 Goal: Only use big cores when it is necessary.
 Frequent, but low intensity tasks are assumed to suffer minimally by
being stuck on a little core.
 High intensity low priority tasks will not be scheduled on big cores to
finish earlier when it is not necessary.
 Tasks can migrate to match current requirements. Migrate to big
Migrate to LITTLE
Task 1 state
Task 2 state
Task loads
9
Experimental Implementation
 Scheduler modifications:
 Apply PJTs’ load-tracking patch set.
 Set up big and little sched_domains with
no load-balancing between them.
 select_task_rq_fair() checks task load
history to select appropriate target CPU
for tasks waking up.
 Add forced migration mechanism to push
of the currently running task to big core
similar to the existing active load
balancing mechanism.
 Periodically check
(run_rebalance_domains()) current task on
little runqueues for tasks that need to be
forced to migrate to a big core.
 Note: There are known issues related to
global load-balancing.
LL LL
BB BB
load_balance load_balance
select_task_rq_fair()/
forced migration
Forced migration latency:
~160 us on vexpress-a9
(migration->schedule)
10
Evaluation Platforms
 ARM Cortex-A9x4 on Versatile Express platform (SMP)
 4x ARM Cortex-A9 @ 400 MHz, no GPU, no DVFS, no idle.
 Base kernel: Linaro vexpress-a9 Android kernel
 File system: Android 2.3
 LinSched for Linux 3.3-rc7
 Scheduler wrapper/simulator
 https://lkml.org/lkml/2012/3/14/590
 Scheduler ftrace output extension.
 Extended to support simple modelling of performance heterogeneous
systems.
11
Bbench on Android
 Browser benchmark
 Renders a new webpage every ~50s using JavaScript.
 Scrolls each page after a fixed delay.
 Two main threads involved:
 WebViewCoreThread: Webkit rendering thread.
 SurfaceFlinger: Android UI rendering thread.
12
vexpress: Vanilla Scheduler
BB BB BB BB
Time spent in idle.
Roughly equivalent to idle states.
load_balance
Note: big and little CPU’s
have equal performance.
Setup:
~wakeups
13
vexpress: big.LITTLE optimizations
Idle switching
minimized
Deep sleep
most of time
Key tasks mainly
on big cores
BB BB
load_balance
BB BB
load_balance
select_task_rq_fair()/
forced migration
Setup:
Note: big and little CPU’s
have equal performance.
~wakeups
14
big.LITTLE emulation
 Goal: Slow down selected cores on Versatile Express SMP
platform to emulate big.LITTLE performance heterogeneity.
 How: Abusing perf
 Tool for sampling performance counters.
 Setup to sample every 10000 instructions on the little core.
 The sampling overhead reduces the perceived performance.
 Details:
perf record -a -e instructions -C 1,3 -c 10000
-o /dev/null sleep 7200
 Determined by experiments a sampling rate of 10000 slows the cores
down by around 50%.
 Very short tasks might not get hit by a perf sample, thus they might
not experience the performance reduction.
15
vexpress+b.L-emu: Vanilla kernel
High little
residency
Note: Task affinity is more or less
random.
This is just one example run.
BB BB
load_balance
Setup:
LL LL
~wakeups
16
vexpress+b.L-emu: b.L optimizations
Shorter
execution
time.
Key tasks
have higher
big
residency.
Frequent short task has
higher little residency.
load_balance
BB BB
load_balance
select_task_rq_fair()/
forced migration
Setup:
LL LL
~wakeups
17
vexpress+b.L-emu: SurfaceFlinger
 Android UI render task
 Total execution time for 20
runs:
 SMP: 4xA9 no slow-down
(upper bound for performance).
 b.L: 2xA9 with perf slow-down
+ 2xA9 without.
 Execution time varies
significantly on b.L vanilla.
 Task affinity is more or less
random.
 The b.L optimizations solves
this issue.
[s] SMP b.L van. b.L opt.
AVG 10.10 12.68 10.27
MIN 9.78 10.27 9.48
MAX 10.54 16.30 10.92
STDEV 0.12 1.24 0.23
18
vexpress+b.L-emu: Page render time
 Web page render times
 WebViewCore start ->
SurfaceFlinger done
 Render #2: Page scroll
 Render #6: Load new page
 b.L optimizations reduce
render time variations.
 Note: No GPU and low CPU
frequency (400 MHz).
[s] SMP b.L van. b.L opt.
#2
AVG 1.45 1.58 1.45
STDEV 0.01 0.11 0.01
#6
AVG 2.58 2.88 2.62
STDEV 0.05 0.24 0.06
19
LinSched Test Case
 Synthetic workload inspired by Bbench processes on Android
 Setup: 2 big + 2 LITTLE
 big CPUs are 2x faster than LITTLE in this model.
 Task definitions:
Task nice busy* sleep* Description
1+2 0 3 40 Background noise, too short for big
3 0 200 100 CPU intensive, big candidate
4 0 200 120 CPU intensive, big candidate
5 10 200 400 Low priority, CPU intensive
6 10 100 300 Low priority, CPU intensive
7 10 100 250 Low priority, CPU intensive
* [ms]
20
LinSched: Vanilla Linux Scheduler
Processes:
1-2: Background noise tasks
3-4: Important tasks
5-7: Low priority tasks
Frequent wakeups on big
Important tasks
~wakeups
21
LinSched: big.LITTLE optimized sched.
Important tasks
completed faster
on big
Processes:
1-2: Background noise tasks
3-4: Important tasks
5-7: Low priority tasks
Idle switching
minimized
~wakeups
22
Next: Improve big.LITTLE support
 big.LITTLE sched_domain balancing
 Use all cores, including LITTLE, for heavy multi-threaded workloads.
 Fixes the sysbench CPU benchmark use case.
 Requires appropriate CPU_POWER to be set for each domain.
LL
BB T0T0 T1T1 T2T2 T3T3
Active tasks:
idleidle
Load:
0%
100%
23
Next: Improve big.LITTLE support
 Per sched domain scheduling policies
 Support for different load-balancing policies for big and LITTLE
domains. For example:
 LITTLE: Spread tasks to minimize frequency.
 Big: Consolidate tasks to as few cores as possible.
LL
BB T2T2 T3T3 T4T4 T5T5
Active tasks:
idleidle
Load:
50%
100%
BB
T0T0
0%
LL50%
T1T1
24
Next: Improve big.LITTLE support
 CPUfreq -> scheduler feedback
 Let the scheduler know about current OPP and max. OPP for each
core to improve load-balancer power awareness.
 This could improve SMP as well.
 Ongoing discussions on LKML about topology/scheduler interface:
 http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02641.html
 Linaro Connect session: What inputs could the scheduler use?
LL
BB
T1T1 T2T2
Active tasks:
idleidle
Load:
100%
0%
Freq:
50%
50%
Increase LITTLE freq instead
25
Questions/Discussion
26
Backup slides
27
Forced Migration Latency
 Measured on vexpress-a9
 Latency from migration ->
schedule on target
 ~160 us (immediate schedule)
 Much longer if target is
already busy (~10 ms)
Scheduled immediately
Scheduled later
28
sched_domain configurations
[ 0.364272] CPU0 attaching sched-domain:
[ 0.364306] domain 0: span 0,2 level MC
[ 0.364336] groups: 0 2
[ 0.364380] domain 1: does not load-balance
[ 0.364474] CPU1 attaching sched-domain:
[ 0.364500] domain 0: span 1,3 level MC
[ 0.364526] groups: 1 3
[ 0.364567] domain 1: does not load-balance
[ 0.364611] CPU2 attaching sched-domain:
[ 0.364633] domain 0: span 0,2 level MC
[ 0.364658] groups: 2 0
[ 0.364700] domain 1: does not load-balance
[ 0.364742] CPU3 attaching sched-domain:
[ 0.364764] domain 0: span 1,3 level MC
[ 0.364788] groups: 3 1
[ 0.364829] domain 1: does not load-balance
big.LITTLE optimizationsVanilla
[ 0.372939] CPU0 attaching sched-domain:
[ 0.373014] domain 0: span 0-3 level MC
[ 0.373044] groups: 0 1 2 3
[ 0.373172] CPU1 attaching sched-domain:
[ 0.373196] domain 0: span 0-3 level MC
[ 0.373222] groups: 1 2 3 0
[ 0.373293] CPU2 attaching sched-domain:
[ 0.373313] domain 0: span 0-3 level MC
[ 0.373337] groups: 2 3 0 1
[ 0.373404] CPU3 attaching sched-domain:
[ 0.373423] domain 0: span 0-3 level MC
[ 0.373446] groups: 3 0 1 2

Contenu connexe

Tendances

Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stackAnne Nicolas
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningRenaldas Zioma
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016Brendan Gregg
 
Q2.12: Scheduler Inputs
Q2.12: Scheduler InputsQ2.12: Scheduler Inputs
Q2.12: Scheduler InputsLinaro
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
SFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateSFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateLinaro
 
Alibaba cloud benchmarking report ecs rds limton xavier
Alibaba cloud benchmarking report ecs  rds limton xavierAlibaba cloud benchmarking report ecs  rds limton xavier
Alibaba cloud benchmarking report ecs rds limton xavierLimton Xavier
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorSeongJae Park
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaAvinash Ramineni
 
Linux Kernel Memory Model
Linux Kernel Memory ModelLinux Kernel Memory Model
Linux Kernel Memory ModelSeongJae Park
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologiesBrendan Gregg
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachjemin lee
 
CPU scheduling ppt file
CPU scheduling ppt fileCPU scheduling ppt file
CPU scheduling ppt fileDwight Sabio
 
Rate limiters in big data systems
Rate limiters in big data systemsRate limiters in big data systems
Rate limiters in big data systemsSandeep Joshi
 
HKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationHKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationLinaro
 
An Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelAn Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelSeongJae Park
 

Tendances (20)

Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stack
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
 
Q2.12: Scheduler Inputs
Q2.12: Scheduler InputsQ2.12: Scheduler Inputs
Q2.12: Scheduler Inputs
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Real Time Support For Xen
Real Time Support For XenReal Time Support For Xen
Real Time Support For Xen
 
SFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateSFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress Update
 
Alibaba cloud benchmarking report ecs rds limton xavier
Alibaba cloud benchmarking report ecs  rds limton xavierAlibaba cloud benchmarking report ecs  rds limton xavier
Alibaba cloud benchmarking report ecs rds limton xavier
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Linux Kernel Memory Model
Linux Kernel Memory ModelLinux Kernel Memory Model
Linux Kernel Memory Model
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 
CPU scheduling ppt file
CPU scheduling ppt fileCPU scheduling ppt file
CPU scheduling ppt file
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
Rate limiters in big data systems
Rate limiters in big data systemsRate limiters in big data systems
Rate limiters in big data systems
 
HKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationHKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolation
 
An Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelAn Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux Kernel
 

En vedette

Kernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded DevicesKernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded DevicesRyo Jin
 
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSStephan Cadene
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsJiannan Ouyang, PhD
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
 
Denser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microserversDenser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microserversJez Halford
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
Preemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloudPreemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloudJiannan Ouyang, PhD
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
SDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumSDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumThe Linux Foundation
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEric Van Hensbergen
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2Outlyer
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_AnalysisBuland Singh
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionHemanth Venkatesh
 
Memory Barriers in the Linux Kernel
Memory Barriers in the Linux KernelMemory Barriers in the Linux Kernel
Memory Barriers in the Linux KernelDavidlohr Bueso
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterAaron Joue
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespacesLocaweb
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationLinaro
 
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...David Evans
 

En vedette (20)

Kernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded DevicesKernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded Devices
 
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
 
Denser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microserversDenser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microservers
 
Cache profiling on ARM Linux
Cache profiling on ARM LinuxCache profiling on ARM Linux
Cache profiling on ARM Linux
 
Docker by demo
Docker by demoDocker by demo
Docker by demo
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
Preemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloudPreemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloud
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
SDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumSDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + Quantum
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysis
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emption
 
Memory Barriers in the Linux Kernel
Memory Barriers in the Linux KernelMemory Barriers in the Linux Kernel
Memory Barriers in the Linux Kernel
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespaces
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM Virtualization
 
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
 

Similaire à Q2.12: Research Update on big.LITTLE MP Scheduling

Deadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINEDeadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINEYoshitake Kobayashi
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EASLinaro
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptMohmdUmer
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLinaro
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementationRajan Kumar
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...Andrey Korolyov
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningScott Jenner
 
XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...
XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...
XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...The Linux Foundation
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940Samsung Electronics
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
Parallel Batch Performance Considerations
Parallel Batch Performance ConsiderationsParallel Batch Performance Considerations
Parallel Batch Performance ConsiderationsMartin Packer
 

Similaire à Q2.12: Research Update on big.LITTLE MP Scheduling (20)

Deadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINEDeadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINE
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
 
Lab6 rtos
Lab6 rtosLab6 rtos
Lab6 rtos
 
Matopt
MatoptMatopt
Matopt
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...
XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...
XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington Univer...
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
RTDroid_Presentation
RTDroid_PresentationRTDroid_Presentation
RTDroid_Presentation
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Parallel Batch Performance Considerations
Parallel Batch Performance ConsiderationsParallel Batch Performance Considerations
Parallel Batch Performance Considerations
 

Plus de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Plus de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Dernier

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Q2.12: Research Update on big.LITTLE MP Scheduling

  • 1. 1 Research Update on big.LITTLE MP Scheduling Morten Rasmussen Technology Researcher
  • 2. 2 Why is big.LITTLE different from SMP?  SMP:  Scheduling goal is to distribute work evenly across all available CPUs to get maximum performance.  If we have DVFS support we can even save power this way too.  big.LITTLE:  Scheduling goal is to maximize power efficiency with only a modest performance sacrifice.  Task should be distributed unevenly. Only critical tasks should execute on big CPUs to minimize power consumption.  Contrary to SMP, it matters where a task is scheduled.
  • 3. 3  Example: Android UI render thread execution time. What is the (mainline) status? 4 core SMP 2+2 big.LITTLE (emulated) It matters where a task is scheduled.
  • 4. 4  Example: Android UI render thread execution time. What is the (mainline) status? 4 core SMP 2+2 big.LITTLE (emulated) It matters where a task is scheduled. big.LITTLE aware scheduling
  • 5. 5 Mainline Linux Scheduler  Linux has two schedulers to handle the scheduling policies:  RT: Real-time scheduler for very high priority tasks.  CFS: Completely Fair Scheduler for anything else and is used for almost all tasks.  We need proper big.LITTLE/heterogeneous platform support in CFS.  Load-balancing is currently based on an expression of CPU load which is basically:  The scheduler does not know how much CPU time is consumed by each task.  The current scheduler can handle distributing task fairly evenly based on cpu_power for big.LITTLE system, but this is not what we want for power efficiency. cpuload=cpupower⋅∑ task priotask
  • 6. 6 Tracking task load  The load contribution of a particular task is needed to make an appropriate scheduling decision.  We have experimented internally with identifying task characteristics based on the tasks’ time slice utilization.  Recently, Paul Turner (Google) posted a RFC patch set on LKML with similar features.  LKML: https://lkml.org/lkml/2012/2/1/763
  • 7. 7 Entity load-tracking summary  Patch set for improving fair group scheduling, but adds some essential bits that are very useful for big.LITTLE.  Tracks the time each task spends on the runqueue (executing or waiting) approximately every ms. Note that: trunqueue ≥ texecuting  The contributed load is a geometric series over the history of time spent on the runqueue scaled by the task priority. Task load Task state Executing Sleep Load decay
  • 8. 8 big.LITTLE scheduling: First stab  Policy: Keep all tasks on little cores unless: 1. The task load (runqueue residency) is above a fixed threshold, and 2. The task priority is default or higher (nice ≤ 0)  Goal: Only use big cores when it is necessary.  Frequent, but low intensity tasks are assumed to suffer minimally by being stuck on a little core.  High intensity low priority tasks will not be scheduled on big cores to finish earlier when it is not necessary.  Tasks can migrate to match current requirements. Migrate to big Migrate to LITTLE Task 1 state Task 2 state Task loads
  • 9. 9 Experimental Implementation  Scheduler modifications:  Apply PJTs’ load-tracking patch set.  Set up big and little sched_domains with no load-balancing between them.  select_task_rq_fair() checks task load history to select appropriate target CPU for tasks waking up.  Add forced migration mechanism to push of the currently running task to big core similar to the existing active load balancing mechanism.  Periodically check (run_rebalance_domains()) current task on little runqueues for tasks that need to be forced to migrate to a big core.  Note: There are known issues related to global load-balancing. LL LL BB BB load_balance load_balance select_task_rq_fair()/ forced migration Forced migration latency: ~160 us on vexpress-a9 (migration->schedule)
  • 10. 10 Evaluation Platforms  ARM Cortex-A9x4 on Versatile Express platform (SMP)  4x ARM Cortex-A9 @ 400 MHz, no GPU, no DVFS, no idle.  Base kernel: Linaro vexpress-a9 Android kernel  File system: Android 2.3  LinSched for Linux 3.3-rc7  Scheduler wrapper/simulator  https://lkml.org/lkml/2012/3/14/590  Scheduler ftrace output extension.  Extended to support simple modelling of performance heterogeneous systems.
  • 11. 11 Bbench on Android  Browser benchmark  Renders a new webpage every ~50s using JavaScript.  Scrolls each page after a fixed delay.  Two main threads involved:  WebViewCoreThread: Webkit rendering thread.  SurfaceFlinger: Android UI rendering thread.
  • 12. 12 vexpress: Vanilla Scheduler BB BB BB BB Time spent in idle. Roughly equivalent to idle states. load_balance Note: big and little CPU’s have equal performance. Setup: ~wakeups
  • 13. 13 vexpress: big.LITTLE optimizations Idle switching minimized Deep sleep most of time Key tasks mainly on big cores BB BB load_balance BB BB load_balance select_task_rq_fair()/ forced migration Setup: Note: big and little CPU’s have equal performance. ~wakeups
  • 14. 14 big.LITTLE emulation  Goal: Slow down selected cores on Versatile Express SMP platform to emulate big.LITTLE performance heterogeneity.  How: Abusing perf  Tool for sampling performance counters.  Setup to sample every 10000 instructions on the little core.  The sampling overhead reduces the perceived performance.  Details: perf record -a -e instructions -C 1,3 -c 10000 -o /dev/null sleep 7200  Determined by experiments a sampling rate of 10000 slows the cores down by around 50%.  Very short tasks might not get hit by a perf sample, thus they might not experience the performance reduction.
  • 15. 15 vexpress+b.L-emu: Vanilla kernel High little residency Note: Task affinity is more or less random. This is just one example run. BB BB load_balance Setup: LL LL ~wakeups
  • 16. 16 vexpress+b.L-emu: b.L optimizations Shorter execution time. Key tasks have higher big residency. Frequent short task has higher little residency. load_balance BB BB load_balance select_task_rq_fair()/ forced migration Setup: LL LL ~wakeups
  • 17. 17 vexpress+b.L-emu: SurfaceFlinger  Android UI render task  Total execution time for 20 runs:  SMP: 4xA9 no slow-down (upper bound for performance).  b.L: 2xA9 with perf slow-down + 2xA9 without.  Execution time varies significantly on b.L vanilla.  Task affinity is more or less random.  The b.L optimizations solves this issue. [s] SMP b.L van. b.L opt. AVG 10.10 12.68 10.27 MIN 9.78 10.27 9.48 MAX 10.54 16.30 10.92 STDEV 0.12 1.24 0.23
  • 18. 18 vexpress+b.L-emu: Page render time  Web page render times  WebViewCore start -> SurfaceFlinger done  Render #2: Page scroll  Render #6: Load new page  b.L optimizations reduce render time variations.  Note: No GPU and low CPU frequency (400 MHz). [s] SMP b.L van. b.L opt. #2 AVG 1.45 1.58 1.45 STDEV 0.01 0.11 0.01 #6 AVG 2.58 2.88 2.62 STDEV 0.05 0.24 0.06
  • 19. 19 LinSched Test Case  Synthetic workload inspired by Bbench processes on Android  Setup: 2 big + 2 LITTLE  big CPUs are 2x faster than LITTLE in this model.  Task definitions: Task nice busy* sleep* Description 1+2 0 3 40 Background noise, too short for big 3 0 200 100 CPU intensive, big candidate 4 0 200 120 CPU intensive, big candidate 5 10 200 400 Low priority, CPU intensive 6 10 100 300 Low priority, CPU intensive 7 10 100 250 Low priority, CPU intensive * [ms]
  • 20. 20 LinSched: Vanilla Linux Scheduler Processes: 1-2: Background noise tasks 3-4: Important tasks 5-7: Low priority tasks Frequent wakeups on big Important tasks ~wakeups
  • 21. 21 LinSched: big.LITTLE optimized sched. Important tasks completed faster on big Processes: 1-2: Background noise tasks 3-4: Important tasks 5-7: Low priority tasks Idle switching minimized ~wakeups
  • 22. 22 Next: Improve big.LITTLE support  big.LITTLE sched_domain balancing  Use all cores, including LITTLE, for heavy multi-threaded workloads.  Fixes the sysbench CPU benchmark use case.  Requires appropriate CPU_POWER to be set for each domain. LL BB T0T0 T1T1 T2T2 T3T3 Active tasks: idleidle Load: 0% 100%
  • 23. 23 Next: Improve big.LITTLE support  Per sched domain scheduling policies  Support for different load-balancing policies for big and LITTLE domains. For example:  LITTLE: Spread tasks to minimize frequency.  Big: Consolidate tasks to as few cores as possible. LL BB T2T2 T3T3 T4T4 T5T5 Active tasks: idleidle Load: 50% 100% BB T0T0 0% LL50% T1T1
  • 24. 24 Next: Improve big.LITTLE support  CPUfreq -> scheduler feedback  Let the scheduler know about current OPP and max. OPP for each core to improve load-balancer power awareness.  This could improve SMP as well.  Ongoing discussions on LKML about topology/scheduler interface:  http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02641.html  Linaro Connect session: What inputs could the scheduler use? LL BB T1T1 T2T2 Active tasks: idleidle Load: 100% 0% Freq: 50% 50% Increase LITTLE freq instead
  • 27. 27 Forced Migration Latency  Measured on vexpress-a9  Latency from migration -> schedule on target  ~160 us (immediate schedule)  Much longer if target is already busy (~10 ms) Scheduled immediately Scheduled later
  • 28. 28 sched_domain configurations [ 0.364272] CPU0 attaching sched-domain: [ 0.364306] domain 0: span 0,2 level MC [ 0.364336] groups: 0 2 [ 0.364380] domain 1: does not load-balance [ 0.364474] CPU1 attaching sched-domain: [ 0.364500] domain 0: span 1,3 level MC [ 0.364526] groups: 1 3 [ 0.364567] domain 1: does not load-balance [ 0.364611] CPU2 attaching sched-domain: [ 0.364633] domain 0: span 0,2 level MC [ 0.364658] groups: 2 0 [ 0.364700] domain 1: does not load-balance [ 0.364742] CPU3 attaching sched-domain: [ 0.364764] domain 0: span 1,3 level MC [ 0.364788] groups: 3 1 [ 0.364829] domain 1: does not load-balance big.LITTLE optimizationsVanilla [ 0.372939] CPU0 attaching sched-domain: [ 0.373014] domain 0: span 0-3 level MC [ 0.373044] groups: 0 1 2 3 [ 0.373172] CPU1 attaching sched-domain: [ 0.373196] domain 0: span 0-3 level MC [ 0.373222] groups: 1 2 3 0 [ 0.373293] CPU2 attaching sched-domain: [ 0.373313] domain 0: span 0-3 level MC [ 0.373337] groups: 2 3 0 1 [ 0.373404] CPU3 attaching sched-domain: [ 0.373423] domain 0: span 0-3 level MC [ 0.373446] groups: 3 0 1 2