1) The patches add support for managing CPU idle states using the generic PM domain framework and runtime PM. This provides a unified approach for idle management across all devices.
2) Key aspects include extending PM domains to support multiple idle levels, initializing CPU PM domains from device tree, and adding a governor to determine idle states based on wakeup times and QoS.
3) The changes allow the Linux kernel to directly control CPU and cluster idle states when firmware supports the OS-initiated suspend mode in the PSCI standard.
1. Runtime PM for CPU
Idling in Linux Kernel
“freedom” Koan-Sin Tan
freedom_at_computer.org
unconf, COSCUP 2016
Most of the materials are from Kevin, Ulf, and Lina’s past Linaro Connect
presentations
2. SoC Idling & CPU Cluster PM
● Idle management of devices via Runtime
PM and the Generic PM Domain (genpd)
○ A proven concept
● Idle management of CPUs and clusters
(cpuidle)
● Goal: One “idle” to rule them all!
● http://lwn.net/Articles/696712/
○ [PATCH v3 00/15] PM: SoC idle support using
PM domains, Thu, 4 Aug 2016 17:04:47 -0600
https://en.wikipedia.org/wiki/One_Ring
6. Highlight
● Genpd: maintained by Ulf, Kevin, and Rafael
● Various consolidation, fixing issues and regressions
○ Genpd: Removing intermediate states from the power off sequence (4.3)
■ http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/base?id=ba2bbfbf63075850bb523e
2adb815d45e3509995
○ Genpd: Enable the runtime PM centric approach
○ Genpd: Support multiple retention states (4.6)
7. Next steps
● Add genpd power statistics
● Avoid needless wakeup in System PM
8. CPU Cluster PM: background and motivation
More
● CPUs
● integrated devices
● power domains
● micro controllers
● firmware
Kernel needs to evolve
9. Idle management today
● The linux kernel has two distinct ways of managing idle.
● The CPUidle framework for CPUs and for all other devices: runtime PM
combined with generic power domains (genpd). In addition, CPUidle is not
scaling well for multi-cluster SMP systems and heterogeneous systems like
big.LITTLE
● To better manage idle for modern SoCs with a hierarchical structure, we are
exploring extending runtime PM and genpd to CPUs so there is a unified
framework for managing idle across all devices
10. Two separate worlds
CPU
● cpuidle framework
● cpu_(cluster_)pm_*()
● Not scaling well for SMP or multi-cluster
(c.f. Coupled idle states)
[1]
https://www.linuxplumbersconf.org/2012/wp-cont
ent/uploads/2012/09/cpuidle-cluster.pdf
IO devices
● Runtime PM
● Auto suspend
● PM domains
● Generic PM domains (genpd)
[1]
https://events.linuxfoundation.org/images/stories
/pdf/lcjp2012_wysocki.pdf
11. Runtime PM + genpd SoC Idle
Including
● User runtime PM for CPUS
● And CPU-connected “stuff”
○ Interrupt controllers (e.g., ARM GIC)
○ FPUs
○ CPU-local cache (L1 $)
● Model cluster with genpd
○ CPUs are just “devices” in the genpd
○ Genpd includes share resources (e.g., L2 $)
12. SoC/Cluster Idle - Solution
● Use genpd and Runtime PM
● Describe CPUS and domains in DT
○ CPU: #power-domains = <&CPU_PD0>;
○ power-controller: #power-domains-cells;
● Initialize genpd PM domains
● Attach CPU devices to genpd
● Add Runtime PM support for CPUidle and Hotplug
● Provide platform callbacks
14. CLUSTER &
COHERENCY
SoC Idle: With CPU PM
CPU
cpuidle
cpuidle
CPUCPU
CPUCPU
CPUCPU
CPUCPU
cpuidle
cpuidle
cpuidle
cpuidlecpuidle
cpuidle
Runtime
PM
GenPD CPU PM
Platform
Driver
15. Recipe
● Started with upstream kernel (latest kernel I tested is 4.7)
● Prep
○ Extend genpd domains to support multiple idle levels
○ Extend genpd to support IRQ-safe PM domains
● Add
○ Add CPU PM domain framework
○ Call Runtime PM from cpuidle
● Sample
○ Using CPU PM domains: OS Initiated
○ DT changes for SoC
https://github.com/freedomtan/linux/commits/v4.7-rc1-mt8173-soc-idle
16. CPU PM framework
● Init
○ Read domain topology from DT
○ Setup genpd PM domains
● Genpd
○ Last man determination
● Genpd gov:
○ Determine cluster idle state
GenPD
Platform
Driver
CPU PM
GenPD
GenPD
Governor
Init
17. CPU PM domains
● Add CPU PM domain framework
○ Define CPUs as IRQ-safe devices
■ https://patches.linaro.org/patch/63350/
○ Read CPU topology from DT
○ Register domains and set up sub-domains
○ Add online CPUs to the respective CPU PM domains
○ Register for hotplug notifications to understand online CPUs
■ https://patches.linaro.org/patch/63352/
○ Add CPU Runtime PM API for ease of use
○ Add a new genpd governor for CPU domains that takes PM QoS into consideration
● Call CPU Runtime PM from ARM cpuidle driver
○ Use CPU PM domain runtime suspend and resume API
○ Record the next wake time of this CPU in the device’s genpd data for use by governor
https://github.com/freedomtan/linux/blob/v4.7-rc1-mt8173-soc-idle/drivers/base/power/cpu_domains.c
18. CPU PM framework
● Better than couple cpuidle
○ CPUs are not woken up when ready to enter coupled state
○ Handle multi-level CPU-domain topology
● Is not MCPM
○ MCPM handles low level race between CPUs
○ Some v7 SoCs may still need MCPU with CPU PM framework
19. PSCI: PC vs. OSI
Platform Coordinated (PC)
● Default PSCI mode
● FW decides on CPU-Domain idle state
when all CPUs are idle
● FW does not know of LInux CPU QoS
requirements, latency and next CPU
wakeup
● FW decision to power off domain may be
detrimental to power and performance
OS Initiated (OSI)
● Options from PSCI v1.0 onward
● Linux decides the CPU-Domain idle state
● The last CPU in a domain provides the idle
state of the cluster, coherency
● Linux can make a wise choice of
CPU-Domain idle state knowing QoS,
latency, predicated CPU wakeup
20. OSI using CPU PM
● Query PSCI_FEATURES in F/W for OSI support
● Setup CPU PM Domains
○ Reads State-IDs for Cluster and Coherency idle levels from DT
● Callbacks for Domain ON/OFF
○ State-IDs passed from CPU PM
○ Aggregate State-IDs against the CPU
● CPU calls F/W with Composite State-ID
● NO SoC specific drivers needed for ARM v8
22. cpu_suspend(uint32 power_state, ....)
Bit field Description
31:26 Reserved. Must be zero
25:24 PowerLevel
23:17 Reserved. Must be zero.
16 State Type
15:0 StateID
Bit field Description
31 Reserved. Must be zero
30 StateType
29:28 Reserved. Must be zero.
27:0 StateID
Original Format Extended StateID Format
23. Original Format
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Last in level System Cluster CPU
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Reserved: must be 0 Power
Level
Reserved: must be 0 State
type
CPU idle: 0x00010000
Cluster idle: 0x01010000
StateID: not used
24. Extended ID
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Last in level System Cluster CPU
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 Reserved:
must be 0
State
type
StateID
25. Changes for Vendors: ARM64
● DT
○ Domain hierarchy
○ Domain idle states
● Driver
○ None if F/W supports OS Initiated mode
26. Changes for Vendors: ARMv7
● DT
○ Domain hierarchy
○ Domain idle states
● Driver
○ Handle power ON/OFF callbacks
○ MCPM or any race avoiding last man logic
● Revisit SoC specific CPUidle hacks
27. Genpd: IRQ-Safe Restriction
● IRQ-safe domains can only have IRQ-safe sub-domains
● Other cases with domains and devices remains the same
28. Status of patches
● PM / Domains: Multiple genpd states: Merged
● PM / Doamins: IRQ safe domains: Under review
○ http://www.spinics.net/lists/linux-arm-msm/msg19666.html
● CPU PM domains: new framework for CPU cluster: Under review
○ http://www.spinics.net/lists/linux-arm-msm/msg19667.html
● PSCI 1.0 OS Initiated: Support for domain hierarchy: Under review
○ http://www.spinics.net/lists/linux-arm-msm/msg19673.html
○ http://www.spinics.net/lists/linux-arm-msm/msg19674.html
RFC submission on ML:
[1]. Patch v3: http://lwn.net/Articles/696712/
Sample ARM v7a implementation
[2]. https://git.linaro.org/people/lina.iyer/linux-next.git/shortlog/refs/heads/genpd-psci-8084
29. Results
● SoC idle saves power when CPUs are online
○ Critical power saving comes from powering off caches and peripheral h/w
○ ~20 mA @800 Mhz of power saving at idle
● ~5 μs addition to idle enter path
○ Exit latency depends on what happens at domain OFF
● Implemented and tested on DB410c / 96Board
● Implemented and tested on MT8173 EVB and Chromebook OAK rev-5
● Some result measured with Servo board
30.
31. INFO: CPU Node : MPID 0x0, parent_node 1, State 0x0
INFO: CPU Node : MPID 0x1, parent_node 1, State 0x0
INFO: CPU Node : MPID 0xffffffffffffffff, parent_node 1, State 0x2
INFO: CPU Node : MPID 0xffffffffffffffff, parent_node 1, State 0x2
INFO: CPU Node : MPID 0x100, parent_node 2, State 0x0
INFO: CPU Node : MPID 0x101, parent_node 2, State 0x0
INFO: mode = 0x1
32. What changes are needed
● Linux kernel
○ Runtime PM
○ Generic PM domain (genpd)
○ Some patches (likely to be merged into mainline kernel when good enough)
○ dts changes
● PSCI (ATF)
○ OS Initiated: not supported in ATF
○ Extended ID: enable it
○ By the time I started working on it, the code in plat/mediatek/ still uses backward compatible
API (ENABLE_PLAT_COMPAT)
■ Cannot enable Extended ID
33. Links
● Linux Kernel
○ genpd + runtime pm SoC idle
■ Patches V3, http://lwn.net/Articles/696712/
○ Ulfh’s linux-pm next branch
■ https://git.linaro.org/people/ulf.hansson/linux-pm.git
■ https://git.linaro.org/people/ulf.hansson/linux-pm.git/shortlog/refs/heads/next
○ My dts changes for MT8173 (based on Ulf’s next branch)
■ https://git.linaro.org/people/freedom.tan/linux-8173.git/shortlog/refs/heads/v4.6-rc6-mt817
3-soc-idle
● ARM Trusted Firmware (ATF)
○ https://github.com/freedomtan/arm-trusted-firmware/tree/hacks-for-8173evb-osi-0518
34.
35. 1. PM / Domains: Abstract genpd locking
2. PM / Domains: Support IRQ safe PM domains
3. PM / cpu_domains: Setup PM domains for CPUs/clusters
4. ARM: cpuidle: Add runtime PM support for CPUs
5. timer: Export next wake up of a CPU
6. PM / cpu_domains: Record CPUs that are part of the domain
7. PM / cpu_domains: Add PM Domain governor for CPUs
8. Documentation / cpu_domains: Describe CPU PM domains
setup and governor
9. drivers: firmware: psci: Allow OS Initiated suspend mode
10. ARM64: psci: Support cluster idle states for OS-Initiated
11. ARM64: dts: Add PSCI cpuidle support for MSM8916
12. ARM64: dts: Define CPU power domain for MSM8916
Documentation/power/cpu_domains.txt | 79 ++++++++
Documentation/power/devices.txt | 12 +-
arch/arm64/boot/dts/qcom/msm8916.dtsi | 49 +++++
arch/arm64/kernel/psci.c | 46 ++++-
drivers/base/power/Makefile | 1 +
drivers/base/power/cpu_domains.c | 365 ++++++++++++++++++++++++++++++++++
drivers/base/power/domain.c | 210 +++++++++++++++----
drivers/cpuidle/cpuidle-arm.c | 55 +++++
drivers/firmware/psci.c | 45 ++++-
include/linux/cpu_domains.h | 37 ++++
include/linux/pm_domain.h | 14 +-
include/linux/psci.h | 2 +
include/linux/tick.h | 10 +
include/uapi/linux/psci.h | 5 +
kernel/time/tick-sched.c | 13 ++
15 files changed, 896 insertions(+), 47 deletions(-)
create mode 100644 Documentation/power/cpu_domains.txt
create mode 100644 drivers/base/power/cpu_domains.c
create mode 100644 include/linux/cpu_domains.h
36. Description of Patches
● Patches [1, 2] - Genpd changes. Sets up Generic PM domains
to be called from cpuidle. Genpd uses mutexes for
synchronization. This has to be changed to spinlocks for
domains that may be called from IRQ safe contexts.
● Patch [3] - CPU PM domains. Parses DT and sets up PM
domains for CPUs and builds up the hierarchy of domains
and devices. These are bunch of helper functions.
● Patch [4] - ARM cpuidle driver. Enable ARM cpuidle driver to
call runtime PM. Even though this has been done for ARM,
there is nothing architecture specific about this. Currently, all
idle states other than ARM clock gating calls into runtime
PM. This could also be state specific i.e call into runtime PM
only after a certain state. The changes may be made part of
cpuidle framework, but needs discussion.
● Patches [5, 6, 7] - PM domain governor for CPUs.
Introduces a new genpd governor that looks into the
per-CPU tick device to identify the next CPU wakeup and
determine the available sleep time for the domain. This
along with QoS is used to determine the best idle state for
the domain. A domains' wake up is determined by the first
CPU in that domain to wake up. A coherency level domain's
(parent of a domain containing CPU devices)
wake up is determined by the first CPU amongst all the
CPUs to wake up. Identifying the CPUs and their wakeups
are the part of these patches.
● Patches [9, 10] - ARM64 PSCI platform driver
ARM64 PSCI v1.0 specific. PSCI OS initiated mode. supports
powering off CPU clusters (caches etc, by configuring
separate power controllers). These patches
enable Linux to determine if the f/w supports this mode and
if so, uses CPU PM domain helper functions to create PM
domains and handles the power_on/power_off
callbacks. The resulting cluster state is passed as an
argument to the f/w along with the CPU state.