SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
1
Power-efficient scheduling, and the latest
news from the kernel summit
Linaro Connect USA 2013
Morten Rasmussen, Dietmar Eggemann
2
Topics Overview
 Timeline
 Towards a unified scheduler driven power policy
 Task placement based on CPU suitability
 Kernel Summit Feedback
 Status
 Questions?
3
Timeline
 May – Ingo's response to the task packing patches from
VincentG reignited discussions on power-aware scheduling
 Early July – Posted proposed patches for a power aware
scheduler based on a power driver running in conjunction
with the current scheduler
 Avoid big changes to the already complex current scheduler
 Migrate functionality back in to the scheduler when we had worked
out the kinks
 Sept – At Plumbers there was a relatively broad agreement
with the approach
 October – Morten reposts patchset with refined APIs between
power driver and the scheduler
 LKS – Reopened the discussion. More on this later
4
Unified scheduler driven power policy … Why ?
 big.LITTLE MP patches are tested, stable and performant
 Take the principles learnt during the implementation and apply to
an upstream solution
 Existing power management frameworks are not coordinated
(cpufreq, cpuidle) with the scheduler
 E.g. the scheduler decides which cpu to wake up or idle without
having any knowledge about C-states. cpuidle is left to do its best
based on these uninformed choices.
 The scheduler is the most obvious place coordinate power
management at it has the best view of the overall system load.
 The scheduler knows when tasks are scheduled and decides the
load balance. cpufreq has to wait until it can see the result of the
scheduler decisions before it can react.
 Task packing in the scheduler needs P and C-state information to
make informed decisions.
5
Existing Power Policies
 Frequency scaling: cpufreq
 Generic governor + platform specific driver
 Decides target frequency based on overall cpu load.
 Idle state selection: cpuidle
 Generic governor + platform specific driver
 Attempts to predict idle time when cpus enter idle.
 Scheduler:
 Completely generic and unaware of cpufreq and cpuidle policies.
 Determines when and where a task runs, i.e. on which cpu.
 Task placement considering CPU suitability required.
6
cpu1cpu1
Existing Power Policies
cpu0cpu0
Freq Load
T
Scheduler
policy
cpufreq
policy
cpuidle
policy
Powerrq
T
Load balance
idle
Current load (pre-3.11)
Current load (3.11)
 No coordination between power policies to avoid
conflicting/suboptimal decisions.
 Is it a problem?
7
Issues
 Scheduler->cpufreq->scheduler cpu load feedback loop
 From 3.11 the scheduler uses tracked load for load-balancing.
 Tracked load is impacted by frequency scaling. Lower frequency
leads to higher tracked load for the same task.
 Hindering new power-aware scheduling features
 Task packing: Needs feedback from cpufreq to determine when cpus
are full.
 Topology aware task placement: Needs topology information inside
the scheduler to determine the most optimal cpus to use when the
system is partially loaded.
 Heterogeneous systems (big.LITTLE): Needs topology information
and accurate load tracking.
 Thermal also needs to be considered
8
Power scheduler proposal
Power driver (drivers/*/?.c)Scheduler (fair.c) Power framework (power.c)
Helper function
library
Driver registrationsched_domain
Hierarchy
(Generic topology)
Load balance
algorithms
Detailed platform
topology
Platform HW driver
Load tracking
Platform perf. and
energy monitoring
Performance state
selection
Sleep state
selection
“Important tasks”
cgroup
+ New generic info
(pack, heterogeneous, ...)
+ Packing,
+ P & C-state aware,
+ Heterogeneous
+ Scale invariant
Abstract power
driver/topology
interface
Existing policy algorithms
Library (drivers/power/?.c)
9
Task placement based on CPU suitability
Part of the power scheduler proposal
 sched_domain hierarchy
 Load balance algorithm (Heterogeneous)
Existing big.LITTLE MP Patches
 Definition: CFS scheduler optimization for heterogeneous platforms.
Attempts to select task affinity to optimize power and performance
based on task load and CPU type
 Hosted at
http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git
 Co-exists with existing (CFS) scheduler code
 Guarded by CONFIG_SCHED_HMP
 Setup HMP domains as a dependency to topology code
Implement big.LITTLE MP functionality inside scheduler mainline code
10
Task placement scheduler architectural bricks
1) Additional sched domain data structures
2) Specify sched domain level for task placement
3) Unweighted instantaneous load signal
4) Task placement hook in select task
5) Task placement hook in load balance
6) Task placement idle pull
11
Brick 1: Additional sched domain data structures
big.LITTLE MP:
 struct hmp_domain
                                                                            
struct hmp_domain {
        struct cpumask cpus;
        struct cpumask possible_cpus;
        struct list_head hmp_domains;
}
Task placement based on CPU suitability:
 Use the existing sched groups in CPU sched domain level
 Add task load ranges into CPU, sched domain and group
12
Brick 2: Specify sched domain level
big.LITTLE MP:
 No additional sched domain flag
 Deletes SD_LOAD_BALANCE flag in CPU level
Task placement based on CPU suitability:
 Adds SD_SUITABILITY flag to CPU level
13
Brick 3: Unweighted instantaneous load signal
 big.LITTLE MP & Task placement based on CPU suitability:
 For sched entity and cfs_rq
    struct sched_avg {
            u32 runnable_avg_sum, runnable_avg_period;
            u64 last_runnable_update;
            s64 decay_count;
            unsigned long load_avg_contrib;
            unsigned long load_avg_ratio;
    }
 sched entity: runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1)
 cfs_rq: set in [update/enqueue/dequeue]_entity_load_avg()
14
Brick 4: Task placement hook in select task
big.LITTLE MP:
 Force new non-kernel tasks onto big CPUs until
load stabilises
 Least loaded CPU of big cluster is used
Task placement based on CPU suitability:
 Use task load ranges of previous CPU and
(initialized) task load ratio to set new CPU
15
Brick 5: Task placement hook in load balance
big.LITTLE MP:
 Completely bypasses load_balance() in CPU level
 hmp_force_up_migration() in run_rebalance_domains()
 Calls hmp_up_migration() for migration to faster CPU
 Calls hmp_offload_down() for using little CPUs when idle
 Does not use env->imbalance or something equivalent
Task placement based on CPU suitability:
 Happens inside load_balance()
 Find most unsuitable queue (i.e. find source run-queue)
 Move unsuitable tasks (counterpart to load balance)
 Move one unsuitable task (counterpart to active load balance)
 Cannot use env->imbalance to control load balance
 Using grp_load_avg_ratio/(NICE_0_LOAD * sg->group_weight) <= THRESHOLD
 Falling back to 'mainline load balance' in case condition is not meet (destination
group is already overloaded)
16
Brick 6: Task placement idle pull
big.LITTLE MP:
 Big CPU pulls running task above the threshold from little CPU
Task placement based on CPU suitability:
 Not necessary because idle_balance()->load_balance() is not
suppressed on CPU level by missing SD_LOAD_BALANCE flag
 Idle pull happens inside load_balance
17
Kernel Summit Feedback
 Good to get active discussion
 First time with everybody in the same room
 LWN article - “The power-aware scheduling mini-summit”
 Key points made
 Power benchmarks are needed for evaluation
 Use-case descriptions are needed to define common ground.
 The scheduler needs energy/power information to make power-aware
scheduling decisions.
 Power-awareness should be moved into the scheduler.
 cpufreq is not fit for its purpose and should go away.
 cpuidle will be integrated in the scheduler. Possibly support by
new per task properties, such as latency constraints
 Are there ways to replay energy scenarios?
 Linsched or perf sched
18
Kernel Summit feedback observations
 All part of the open-source process
 Discussions have raised awareness of the issues
 Maintainers recognise the need for improved power management
 Iterative approach necessary but the steps are clear
 Maintainers have a clear server/desktop background
 ARM community can help educate this audience on embedded
requirements
 Benchmarking for power could be hard to do in a simple way
 Cyclic test, sysbench type tests unlikely to yield realistic results in real
systems
 However, full accuracy not required
 Power models necessarily complex and often closely guarded
secrets
 Collection and reporting of meaningful metrics is probably sufficient
19
Status
 Latest Power-aware scheduling patches on LKML
 https://lkml.org/lkml/2013/10/11/547
 Task placement based on CPU suitability patches prepared
 Proof of concept done
 Waiting for right time to post to lists
 Feedback from Linux kernel Summit needs to be discussed
20
Questions?
 Thanks for listening.

Contenu connexe

Tendances

Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
John Gunnels
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures
Islam Samir
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Zbigniew Jerzak
 
80a disaster recovery
80a disaster recovery80a disaster recovery
80a disaster recovery
mapr-academy
 
11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...
Alexander Decker
 

Tendances (20)

A Brief Survey of Current Power Limiting Strategies
A Brief Survey of Current Power Limiting StrategiesA Brief Survey of Current Power Limiting Strategies
A Brief Survey of Current Power Limiting Strategies
 
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
 
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures
 
J0210053057
J0210053057J0210053057
J0210053057
 
Improvement of Scheduling Granularity for Deadline Scheduler
Improvement of Scheduling Granularity for Deadline Scheduler Improvement of Scheduling Granularity for Deadline Scheduler
Improvement of Scheduling Granularity for Deadline Scheduler
 
A Comparative Study between Honeybee Foraging Behaviour Algorithm and Round ...
A Comparative Study between Honeybee Foraging Behaviour Algorithm and  Round ...A Comparative Study between Honeybee Foraging Behaviour Algorithm and  Round ...
A Comparative Study between Honeybee Foraging Behaviour Algorithm and Round ...
 
Green scheduling
Green schedulingGreen scheduling
Green scheduling
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Load balancing
Load balancingLoad balancing
Load balancing
 
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloud
 
80a disaster recovery
80a disaster recovery80a disaster recovery
80a disaster recovery
 
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAn Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
 
Optimal load balancing in cloud computing
Optimal load balancing in cloud computingOptimal load balancing in cloud computing
Optimal load balancing in cloud computing
 
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
 
Fault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big DataFault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big Data
 
11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 

En vedette

Docker & Badoo: 
никогда не останавливайся на достигнутом
Docker & Badoo: 
никогда не останавливайся на достигнутомDocker & Badoo: 
никогда не останавливайся на достигнутом
Docker & Badoo: 
никогда не останавливайся на достигнутом
Anton Turetsky
 
5G Wireless Technology
5G Wireless Technology5G Wireless Technology
5G Wireless Technology
Niki Upadhyay
 

En vedette (16)

The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
 
Intro to cluster scheduler for Linux containers
Intro to cluster scheduler for Linux containersIntro to cluster scheduler for Linux containers
Intro to cluster scheduler for Linux containers
 
Incremental pattern matching in the VIATRA2 model transformation system
Incremental pattern matching in the VIATRA2 model transformation systemIncremental pattern matching in the VIATRA2 model transformation system
Incremental pattern matching in the VIATRA2 model transformation system
 
Live model transformations driven by incremental pattern matching
Live model transformations driven by incremental pattern matchingLive model transformations driven by incremental pattern matching
Live model transformations driven by incremental pattern matching
 
Generic and Meta-Transformations for Model Transformation Engineering
Generic and Meta-Transformations for Model Transformation EngineeringGeneric and Meta-Transformations for Model Transformation Engineering
Generic and Meta-Transformations for Model Transformation Engineering
 
Operating system 11.10.2016 adarsh bang
Operating system 11.10.2016 adarsh bangOperating system 11.10.2016 adarsh bang
Operating system 11.10.2016 adarsh bang
 
Jireh ict
Jireh ictJireh ict
Jireh ict
 
Docker & Badoo: 
никогда не останавливайся на достигнутом
Docker & Badoo: 
никогда не останавливайся на достигнутомDocker & Badoo: 
никогда не останавливайся на достигнутом
Docker & Badoo: 
никогда не останавливайся на достигнутом
 
Linux O(1) Scheduling
Linux O(1) SchedulingLinux O(1) Scheduling
Linux O(1) Scheduling
 
React native
React nativeReact native
React native
 
Scheduling In Linux
Scheduling In LinuxScheduling In Linux
Scheduling In Linux
 
Linux scheduler
Linux schedulerLinux scheduler
Linux scheduler
 
Process scheduling linux
Process scheduling linuxProcess scheduling linux
Process scheduling linux
 
3. CPU virtualization and scheduling
3. CPU virtualization and scheduling3. CPU virtualization and scheduling
3. CPU virtualization and scheduling
 
Insider operating system
Insider   operating systemInsider   operating system
Insider operating system
 
5G Wireless Technology
5G Wireless Technology5G Wireless Technology
5G Wireless Technology
 

Similaire à LCU13: Power-efficient scheduling, and the latest news from the kernel summit

load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940
Samsung Electronics
 
참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
DzH QWuynh
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
NECST Lab @ Politecnico di Milano
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
James McGalliard
 

Similaire à LCU13: Power-efficient scheduling, and the latest news from the kernel summit (20)

BKK16-311 EAS Upstream Stategy
BKK16-311 EAS Upstream StategyBKK16-311 EAS Upstream Stategy
BKK16-311 EAS Upstream Stategy
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
 
Q2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE systemQ2.12: Implications of Per CPU switching in a big.LITTLE system
Q2.12: Implications of Per CPU switching in a big.LITTLE system
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
 
Parallel and Distributed Computing Chapter 9
Parallel and Distributed Computing Chapter 9Parallel and Distributed Computing Chapter 9
Parallel and Distributed Computing Chapter 9
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common KernelLAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisor
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
Dynamic task scheduling on multicore automotive ec us
Dynamic task scheduling on multicore automotive ec usDynamic task scheduling on multicore automotive ec us
Dynamic task scheduling on multicore automotive ec us
 

Plus de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

Plus de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

LCU13: Power-efficient scheduling, and the latest news from the kernel summit

  • 1. 1 Power-efficient scheduling, and the latest news from the kernel summit Linaro Connect USA 2013 Morten Rasmussen, Dietmar Eggemann
  • 2. 2 Topics Overview  Timeline  Towards a unified scheduler driven power policy  Task placement based on CPU suitability  Kernel Summit Feedback  Status  Questions?
  • 3. 3 Timeline  May – Ingo's response to the task packing patches from VincentG reignited discussions on power-aware scheduling  Early July – Posted proposed patches for a power aware scheduler based on a power driver running in conjunction with the current scheduler  Avoid big changes to the already complex current scheduler  Migrate functionality back in to the scheduler when we had worked out the kinks  Sept – At Plumbers there was a relatively broad agreement with the approach  October – Morten reposts patchset with refined APIs between power driver and the scheduler  LKS – Reopened the discussion. More on this later
  • 4. 4 Unified scheduler driven power policy … Why ?  big.LITTLE MP patches are tested, stable and performant  Take the principles learnt during the implementation and apply to an upstream solution  Existing power management frameworks are not coordinated (cpufreq, cpuidle) with the scheduler  E.g. the scheduler decides which cpu to wake up or idle without having any knowledge about C-states. cpuidle is left to do its best based on these uninformed choices.  The scheduler is the most obvious place coordinate power management at it has the best view of the overall system load.  The scheduler knows when tasks are scheduled and decides the load balance. cpufreq has to wait until it can see the result of the scheduler decisions before it can react.  Task packing in the scheduler needs P and C-state information to make informed decisions.
  • 5. 5 Existing Power Policies  Frequency scaling: cpufreq  Generic governor + platform specific driver  Decides target frequency based on overall cpu load.  Idle state selection: cpuidle  Generic governor + platform specific driver  Attempts to predict idle time when cpus enter idle.  Scheduler:  Completely generic and unaware of cpufreq and cpuidle policies.  Determines when and where a task runs, i.e. on which cpu.  Task placement considering CPU suitability required.
  • 6. 6 cpu1cpu1 Existing Power Policies cpu0cpu0 Freq Load T Scheduler policy cpufreq policy cpuidle policy Powerrq T Load balance idle Current load (pre-3.11) Current load (3.11)  No coordination between power policies to avoid conflicting/suboptimal decisions.  Is it a problem?
  • 7. 7 Issues  Scheduler->cpufreq->scheduler cpu load feedback loop  From 3.11 the scheduler uses tracked load for load-balancing.  Tracked load is impacted by frequency scaling. Lower frequency leads to higher tracked load for the same task.  Hindering new power-aware scheduling features  Task packing: Needs feedback from cpufreq to determine when cpus are full.  Topology aware task placement: Needs topology information inside the scheduler to determine the most optimal cpus to use when the system is partially loaded.  Heterogeneous systems (big.LITTLE): Needs topology information and accurate load tracking.  Thermal also needs to be considered
  • 8. 8 Power scheduler proposal Power driver (drivers/*/?.c)Scheduler (fair.c) Power framework (power.c) Helper function library Driver registrationsched_domain Hierarchy (Generic topology) Load balance algorithms Detailed platform topology Platform HW driver Load tracking Platform perf. and energy monitoring Performance state selection Sleep state selection “Important tasks” cgroup + New generic info (pack, heterogeneous, ...) + Packing, + P & C-state aware, + Heterogeneous + Scale invariant Abstract power driver/topology interface Existing policy algorithms Library (drivers/power/?.c)
  • 9. 9 Task placement based on CPU suitability Part of the power scheduler proposal  sched_domain hierarchy  Load balance algorithm (Heterogeneous) Existing big.LITTLE MP Patches  Definition: CFS scheduler optimization for heterogeneous platforms. Attempts to select task affinity to optimize power and performance based on task load and CPU type  Hosted at http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git  Co-exists with existing (CFS) scheduler code  Guarded by CONFIG_SCHED_HMP  Setup HMP domains as a dependency to topology code Implement big.LITTLE MP functionality inside scheduler mainline code
  • 10. 10 Task placement scheduler architectural bricks 1) Additional sched domain data structures 2) Specify sched domain level for task placement 3) Unweighted instantaneous load signal 4) Task placement hook in select task 5) Task placement hook in load balance 6) Task placement idle pull
  • 11. 11 Brick 1: Additional sched domain data structures big.LITTLE MP:  struct hmp_domain                                                                              struct hmp_domain {         struct cpumask cpus;         struct cpumask possible_cpus;         struct list_head hmp_domains; } Task placement based on CPU suitability:  Use the existing sched groups in CPU sched domain level  Add task load ranges into CPU, sched domain and group
  • 12. 12 Brick 2: Specify sched domain level big.LITTLE MP:  No additional sched domain flag  Deletes SD_LOAD_BALANCE flag in CPU level Task placement based on CPU suitability:  Adds SD_SUITABILITY flag to CPU level
  • 13. 13 Brick 3: Unweighted instantaneous load signal  big.LITTLE MP & Task placement based on CPU suitability:  For sched entity and cfs_rq     struct sched_avg {             u32 runnable_avg_sum, runnable_avg_period;             u64 last_runnable_update;             s64 decay_count;             unsigned long load_avg_contrib;             unsigned long load_avg_ratio;     }  sched entity: runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1)  cfs_rq: set in [update/enqueue/dequeue]_entity_load_avg()
  • 14. 14 Brick 4: Task placement hook in select task big.LITTLE MP:  Force new non-kernel tasks onto big CPUs until load stabilises  Least loaded CPU of big cluster is used Task placement based on CPU suitability:  Use task load ranges of previous CPU and (initialized) task load ratio to set new CPU
  • 15. 15 Brick 5: Task placement hook in load balance big.LITTLE MP:  Completely bypasses load_balance() in CPU level  hmp_force_up_migration() in run_rebalance_domains()  Calls hmp_up_migration() for migration to faster CPU  Calls hmp_offload_down() for using little CPUs when idle  Does not use env->imbalance or something equivalent Task placement based on CPU suitability:  Happens inside load_balance()  Find most unsuitable queue (i.e. find source run-queue)  Move unsuitable tasks (counterpart to load balance)  Move one unsuitable task (counterpart to active load balance)  Cannot use env->imbalance to control load balance  Using grp_load_avg_ratio/(NICE_0_LOAD * sg->group_weight) <= THRESHOLD  Falling back to 'mainline load balance' in case condition is not meet (destination group is already overloaded)
  • 16. 16 Brick 6: Task placement idle pull big.LITTLE MP:  Big CPU pulls running task above the threshold from little CPU Task placement based on CPU suitability:  Not necessary because idle_balance()->load_balance() is not suppressed on CPU level by missing SD_LOAD_BALANCE flag  Idle pull happens inside load_balance
  • 17. 17 Kernel Summit Feedback  Good to get active discussion  First time with everybody in the same room  LWN article - “The power-aware scheduling mini-summit”  Key points made  Power benchmarks are needed for evaluation  Use-case descriptions are needed to define common ground.  The scheduler needs energy/power information to make power-aware scheduling decisions.  Power-awareness should be moved into the scheduler.  cpufreq is not fit for its purpose and should go away.  cpuidle will be integrated in the scheduler. Possibly support by new per task properties, such as latency constraints  Are there ways to replay energy scenarios?  Linsched or perf sched
  • 18. 18 Kernel Summit feedback observations  All part of the open-source process  Discussions have raised awareness of the issues  Maintainers recognise the need for improved power management  Iterative approach necessary but the steps are clear  Maintainers have a clear server/desktop background  ARM community can help educate this audience on embedded requirements  Benchmarking for power could be hard to do in a simple way  Cyclic test, sysbench type tests unlikely to yield realistic results in real systems  However, full accuracy not required  Power models necessarily complex and often closely guarded secrets  Collection and reporting of meaningful metrics is probably sufficient
  • 19. 19 Status  Latest Power-aware scheduling patches on LKML  https://lkml.org/lkml/2013/10/11/547  Task placement based on CPU suitability patches prepared  Proof of concept done  Waiting for right time to post to lists  Feedback from Linux kernel Summit needs to be discussed