SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Enabling Arm® DynamIQ™ support
Dan Handley (Arm)
Ionela Voinescu (Arm)
Vincent Guittot (Linaro)
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● DynamIQ introduction
● DynamIQ and Arm Trusted Firmware
● OS Power Management with DynamIQ
● L3 partial power-down support
ENGINEERS
AND DEVICES
WORKING
TOGETHER
DynamIQ™ key features
From https://developer.arm.com/technologies/dynamiq
1. A new single-cluster design
2. Intelligent compute capabilities
3. Interfaces for closely coupled accelerators
4. Built-in power-saving features
5. DynamIQ big.LITTLE
6. Advanced RAS and safety features
ENGINEERS
AND DEVICES
WORKING
TOGETHER
DynamIQ™ key features
From https://developer.arm.com/technologies/dynamiq
1. A new single-cluster design
2. Intelligent compute capabilities
3. Interfaces for closely coupled accelerators
4. Built-in power-saving features
5. DynamIQ big.LITTLE
6. Advanced RAS and safety features
ENGINEERS AND DEVICES
WORKING TOGETHER
DynamIQ Shared Unit (DSU)
TRM: http://infocenter.arm.com/help/topic/com.arm.doc.100453_0002_00_en
● Armv8.2+ Cortex-A CPU support
○ e.g. Cortex-A55, Cortex-A75
● 2 different CPU types in same cluster
○ Maximum 8
● Per-CPU L1+L2 caches and shared L3
● Per-CPU DVFS control
● Partial L3 cache power down
● Hardware assisted power management
○ Simplifies power up/down software
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● DynamIQ introduction
● DynamIQ and Arm Trusted Firmware
● OS Power Management with DynamIQ
● L3 partial power-down support
ENGINEERS AND DEVICES
WORKING TOGETHER
DynamIQ Shared Unit (DSU) and Arm TF
● DSU enables simpler, faster and more robust software during power up/down
○ Simplified micro-architectural programming sequence
○ Automatic enabling and disabling of coherency with the interconnect
○ Automatic and faster cache flushing at all levels without software intervention
○ Reduced power controller communication via P-channel interface
● TF enables more performant PSCI operations via HW_ASSISTED_COHERENCY
option
○ CPU idle, hotplug, secondary CPU boot
○ Will still work without HW_ASSISTED_COHERENCY but won’t get the benefits
○ Allows more aggressive OSPM tuning
○ Warning: Some HW operations will be invisible to SW and may give misleading statistics
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU idle to power down (Armv8.0 CPUs)
● Validate CPU_SUSPEND arguments
● Acquire locks for non-CPU levels
● PSCI state coordination
● CPU-specific power down handling
○ Disable data caches
○ Flush data cache(s)
○ Disable intra-cluster coherency (!SMP_BIT)
● Stack maintenance
● Platform suspend operations
● Release locks for non-CPU levels
● Wait For Interrupt (WFI)
● Minimal SCTLR initialization
● Platform reset handling
● CPU-specific reset handling
○ Errata handling
○ Enable intra-cluster coherency (SMP_BIT)
● CPU architectural register initialization
● Enable MMU
● Acquire locks for non-CPU levels
● Platform suspend-finish operations
● Stack maintenance
● Enable data caches
● Restore OS context
● PSCI bookkeeping
● Release locks for non-CPU levels
● ERET to OS
Power Down Power UpOS calls SMC
CPU_SUSPEND
Reset
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU idle to power down (Armv8.2 CPUs)
● Validate CPU_SUSPEND arguments
● Acquire locks for non-CPU levels
● PSCI state coordination
● CPU-specific power down handling
○ Request CPU power down
(CORE_PWRDN_EN)
● Platform suspend operations
● Release locks for non-CPU levels
● Wait For Interrupt (WFI)
● Minimal SCTLR initialization
● Platform reset handling
● CPU-specific reset handling
○ Errata handling (none yet)
● CPU architectural register initialization
● Enable MMU and data caches
● Acquire locks for non-CPU levels
● Platform suspend-finish operations
● Restore OS context
● PSCI bookkeeping
● Release locks for non-CPU levels
● ERET to OS
Power Down Power UpOS calls SMC
CPU_SUSPEND
Reset
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU idle to power down (Armv8.2 CPUs)
● Validate CPU_SUSPEND arguments
● Acquire locks for non-CPU levels
● PSCI state coordination
● CPU-specific power down handling
○ Request CPU power down
(CORE_PWRDN_EN)
● Platform suspend operations
● Release locks for non-CPU levels
● Wait For Interrupt (WFI)
● Minimal SCTLR initialization
● Platform reset handling
● CPU-specific reset handling
○ Errata handling (none yet)
● CPU architectural register initialization
● Enable MMU and data caches
● Acquire locks for non-CPU levels
● Platform suspend-finish operations
● Restore OS context
● PSCI bookkeeping
● Release locks for non-CPU levels
● ERET to OS
Power Down Power UpOS calls SMC
CPU_SUSPEND
Reset
D$ enabled
much earlier
D$ remains
enabled
throughout
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU idle to power down (Armv8.2 CPUs)
● Validate CPU_SUSPEND arguments
● Acquire locks for non-CPU levels
● PSCI state coordination
● CPU-specific power down handling
○ Request CPU power down
(CORE_PWRDN_EN)
● Platform suspend operations
● Release locks for non-CPU levels
● Wait For Interrupt (WFI)
● Minimal SCTLR initialization
● Platform reset handling
● CPU-specific reset handling
○ Errata handling (none yet)
● CPU architectural register initialization
● Enable MMU and data caches
● Acquire locks for non-CPU levels
● Platform suspend-finish operations
● Restore OS context
● PSCI bookkeeping
● Release locks for non-CPU levels
● ERET to OS
Power Down Power UpOS calls SMC
CPU_SUSPEND
Reset
No need for explicit
cache flushes or stack
maintenance
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU idle to power down (Armv8.2 CPUs)
● Validate CPU_SUSPEND arguments
● Acquire locks for non-CPU levels
● PSCI state coordination
● CPU-specific power down handling
○ Request CPU power down
(CORE_PWRDN_EN)
● Platform suspend operations
● Release locks for non-CPU levels
● Wait For Interrupt (WFI)
● Minimal SCTLR initialization
● Platform reset handling
● CPU-specific reset handling
○ Errata handling (none yet)
● CPU architectural register initialization
● Enable MMU and data caches
● Acquire locks for non-CPU levels
● Platform suspend-finish operations
● Restore OS context
● PSCI bookkeeping
● Release locks for non-CPU levels
● ERET to OS
Power Down Power UpOS calls SMC
CPU_SUSPEND
Reset
Much more efficient spin
locks instead of bakery locks
(using v8.1 CAS instruction)
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU idle to power down (Armv8.2 CPUs)
● Validate CPU_SUSPEND arguments
● Acquire locks for non-CPU levels
● PSCI state coordination
● CPU-specific power down handling
○ Request CPU power down
(CORE_PWRDN_EN)
● Platform suspend operations
● Release locks for non-CPU levels
● Wait For Interrupt (WFI)
● Minimal SCTLR initialization
● Platform reset handling
● CPU-specific reset handling
○ Errata handling (none yet)
● CPU architectural register initialization
● Enable MMU and data caches
● Acquire locks for non-CPU levels
● Platform suspend-finish operations
● Restore OS context
● PSCI bookkeeping
● Release locks for non-CPU levels
● ERET to OS
Power Down Power UpOS calls SMC
CPU_SUSPEND
Reset
No need for explicit interconnect
programming for masters to
enter/exit coherency
(Potentially) reduced
power controller
communication
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Future TF enhancements
● Use per-thread cluster power voting register
○ CLUSTERPWRDN_EL1
○ Automatic cluster power down or memory retention ...
○ ... if the power controller hardware and firmware support it
● Remove cluster level locks
○ or at least reduce the time they are held
● Analyze performance on DynamIQ hardware platforms
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● DynamIQ introduction
● DynamIQ and Arm Trusted Firmware
● OS Power Management with DynamIQ
● L3 partial power-down support
ENGINEERS AND DEVICES
WORKING TOGETHER
OS Power Management with DynamIQ
● Finer grained power capabilities
○ Already handled by PM frameworks
● Per-core Frequency/Voltage domain
● DSU Frequency/Voltage domain
ENGINEERS AND DEVICES
WORKING TOGETHER
Scheduler domains
● Current big.LITTLE system
○ Energy model layout matches scheduler domain
● Example of 4 big cores + 4 LITTLE cores:
ENGINEERS AND DEVICES
WORKING TOGETHER
Scheduler domains
● DynamIQ changes domains boundaries
○ Not necessarily congruent
○ Physical / Voltage / Frequency / Architecture
● Change the scheduler topology
○ And energy model layout
● Example of 4 big cores + 4 LITTLE cores:
ENGINEERS AND DEVICES
WORKING TOGETHER
Phantom domains
● Add intermediate domain
○ Voltage/Frequency boundary
● Example of 4 big cores + 4 LITTLE cores:
○ Per core DVFS
ENGINEERS AND DEVICES
WORKING TOGETHER
Phantom domains
● Example of 4 big cores + 4 LITTLE cores:
○ One frequency domain for big cores and one for LITTLE cores
○ Frequency domain close to current big.LITTLE system
● Enable similar scheduler
topology
ENGINEERS
AND DEVICES
WORKING
TOGETHER
OSPM next steps
● Shared frequency domains
● Shared voltage domains
● Impact on energy model
● Impact on compute capacity
● Getting notified of power domain OPP change
● Multiple DynamIQ clusters
Reference:
https://developer.arm.com/-/media/developer/developers/open-
source/energy-aware-scheduling/DynamIQ_design_specification_v1.0.pdf
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● DynamIQ introduction
● DynamIQ and Arm Trusted Firmware
● OS Power Management with DynamIQ
● L3 partial power-down support
ENGINEERS
AND DEVICES
WORKING
TOGETHER
L3 partial power-down
● Arm DynamIQ Shared Unit (DSU) L3 cache
○ Implementation specific number of portions controlled through a
power control register
○ Counters for cache misses and cache hits to help drive decisions
● Support in software
○ DevFreq driver
○ Control of active portions based on:
■ Cache hit/miss rates
■ Computed power benefit
■ Bias for performance
○ Out of tree reference implementation:
https://git.linaro.org/landing-teams/working/arm/kernel-
release.git/log/?h=dsu_partial_powerdown_support_v1.0
ENGINEERS AND DEVICES
WORKING TOGETHER
L3 partial power-down: architecture
hit counter
DSU register interfaceLinux Kernel DSU L3 cache
miss counter
control register
DevFreq governor
DevFreq
device
Target portions
Timer
10ms
Update DevFreq
Set target portions
ENGINEERS AND DEVICES
WORKING TOGETHER
L3 partial power-down: algorithm
Upsize: Weigh additional cost in energy of
enabling another portion against potential
savings by decreasing dynamic cost of
accessing DRAM.
● Condition for upsize:
MBW > (1.0 – Tu) * CB
● MBW – miss bandwidth: MiB/sec
● CB – cost bandwidth: MiB/sec
○ CB = L / ED
● L – static leakage of single portion: uJ/sec
● ED – dynamic energy of DRAM: uJ/MiB
● Tu – upsizing threshold: fraction 0.00 to 1.00
○ Bias for performance
● Compare energy consumption
○ Bias for performance
L3 cache static
DRAM dynamic
energy
ENGINEERS AND DEVICES
WORKING TOGETHER
L3 partial power-down: algorithm - 1
Downsize: From an energy trade-off perspective,
to justify a portion to be powered on, requires a
hit bandwidth that pays for its leakage. If that
requirement is not met, it can be powered-off.
● Condition for downsize:
HBW < (N – Td) * CB
● HBW – hit bandwidth: MiB/sec
● N – current number of portions enabled
● CB – cost bandwidth: MiB/sec
○ CB = L / ED
● L – static leakage of single portion: uJ/sec
● ED – dynamic energy of DRAM: uJ/MiB
● Td – downsize threshold: fraction 0.00 to 1.00
○ Bias for performance
● Compare energy consumption
○ Bias for performance
L3 cache static
DRAM dynamic
energy
ENGINEERS AND DEVICES
WORKING TOGETHER
L3 partial power-down: behaviour
Example:
● 2MB L3 cache
● Memcpy workload with
buffer size of 4MB
ENGINEERS AND DEVICES
WORKING TOGETHER
L3 partial power-down: behaviour - 1
Expected behaviour:
● CPU intensive workloads
should not have an effect
on the number of active
portions
● I/O intensive loads should
raise portions when the
cache is well used
ENGINEERS
AND DEVICES
WORKING
TOGETHER
L3 partial power-down
● Limitations of current reference implementation
○ Portion is the smallest single unit of the cache that can be powered
up/down
○ Only support for a single DynamIQ Shared Unit
○ Not suitable for use with the simple on-demand governor
● L3 partial power-down in Arm Trusted Firmware?
Reference:
https://developer.arm.com/-/media/developer/developers/open-
source/energy-aware-scheduling/DynamIQ_design_specification_v1.0.pdf
Thank You
#SFO17
BUD17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

Contenu connexe

Plus de Linaro

HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 

Plus de Linaro (20)

HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Getting the most out of DynamIQ & Enabling support of DynamiQ - SFO17-104

  • 1. Enabling Arm® DynamIQ™ support Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ™ key features From https://developer.arm.com/technologies/dynamiq 1. A new single-cluster design 2. Intelligent compute capabilities 3. Interfaces for closely coupled accelerators 4. Built-in power-saving features 5. DynamIQ big.LITTLE 6. Advanced RAS and safety features
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ™ key features From https://developer.arm.com/technologies/dynamiq 1. A new single-cluster design 2. Intelligent compute capabilities 3. Interfaces for closely coupled accelerators 4. Built-in power-saving features 5. DynamIQ big.LITTLE 6. Advanced RAS and safety features
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ Shared Unit (DSU) TRM: http://infocenter.arm.com/help/topic/com.arm.doc.100453_0002_00_en ● Armv8.2+ Cortex-A CPU support ○ e.g. Cortex-A55, Cortex-A75 ● 2 different CPU types in same cluster ○ Maximum 8 ● Per-CPU L1+L2 caches and shared L3 ● Per-CPU DVFS control ● Partial L3 cache power down ● Hardware assisted power management ○ Simplifies power up/down software
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ Shared Unit (DSU) and Arm TF ● DSU enables simpler, faster and more robust software during power up/down ○ Simplified micro-architectural programming sequence ○ Automatic enabling and disabling of coherency with the interconnect ○ Automatic and faster cache flushing at all levels without software intervention ○ Reduced power controller communication via P-channel interface ● TF enables more performant PSCI operations via HW_ASSISTED_COHERENCY option ○ CPU idle, hotplug, secondary CPU boot ○ Will still work without HW_ASSISTED_COHERENCY but won’t get the benefits ○ Allows more aggressive OSPM tuning ○ Warning: Some HW operations will be invisible to SW and may give misleading statistics
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.0 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Disable data caches ○ Flush data cache(s) ○ Disable intra-cluster coherency (!SMP_BIT) ● Stack maintenance ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling ○ Enable intra-cluster coherency (SMP_BIT) ● CPU architectural register initialization ● Enable MMU ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Stack maintenance ● Enable data caches ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset D$ enabled much earlier D$ remains enabled throughout
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset No need for explicit cache flushes or stack maintenance
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset Much more efficient spin locks instead of bakery locks (using v8.1 CAS instruction)
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset No need for explicit interconnect programming for masters to enter/exit coherency (Potentially) reduced power controller communication
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER Future TF enhancements ● Use per-thread cluster power voting register ○ CLUSTERPWRDN_EL1 ○ Automatic cluster power down or memory retention ... ○ ... if the power controller hardware and firmware support it ● Remove cluster level locks ○ or at least reduce the time they are held ● Analyze performance on DynamIQ hardware platforms
  • 15. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER OS Power Management with DynamIQ ● Finer grained power capabilities ○ Already handled by PM frameworks ● Per-core Frequency/Voltage domain ● DSU Frequency/Voltage domain
  • 17. ENGINEERS AND DEVICES WORKING TOGETHER Scheduler domains ● Current big.LITTLE system ○ Energy model layout matches scheduler domain ● Example of 4 big cores + 4 LITTLE cores:
  • 18. ENGINEERS AND DEVICES WORKING TOGETHER Scheduler domains ● DynamIQ changes domains boundaries ○ Not necessarily congruent ○ Physical / Voltage / Frequency / Architecture ● Change the scheduler topology ○ And energy model layout ● Example of 4 big cores + 4 LITTLE cores:
  • 19. ENGINEERS AND DEVICES WORKING TOGETHER Phantom domains ● Add intermediate domain ○ Voltage/Frequency boundary ● Example of 4 big cores + 4 LITTLE cores: ○ Per core DVFS
  • 20. ENGINEERS AND DEVICES WORKING TOGETHER Phantom domains ● Example of 4 big cores + 4 LITTLE cores: ○ One frequency domain for big cores and one for LITTLE cores ○ Frequency domain close to current big.LITTLE system ● Enable similar scheduler topology
  • 21. ENGINEERS AND DEVICES WORKING TOGETHER OSPM next steps ● Shared frequency domains ● Shared voltage domains ● Impact on energy model ● Impact on compute capacity ● Getting notified of power domain OPP change ● Multiple DynamIQ clusters Reference: https://developer.arm.com/-/media/developer/developers/open- source/energy-aware-scheduling/DynamIQ_design_specification_v1.0.pdf
  • 22. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  • 23. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down ● Arm DynamIQ Shared Unit (DSU) L3 cache ○ Implementation specific number of portions controlled through a power control register ○ Counters for cache misses and cache hits to help drive decisions ● Support in software ○ DevFreq driver ○ Control of active portions based on: ■ Cache hit/miss rates ■ Computed power benefit ■ Bias for performance ○ Out of tree reference implementation: https://git.linaro.org/landing-teams/working/arm/kernel- release.git/log/?h=dsu_partial_powerdown_support_v1.0
  • 24. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: architecture hit counter DSU register interfaceLinux Kernel DSU L3 cache miss counter control register DevFreq governor DevFreq device Target portions Timer 10ms Update DevFreq Set target portions
  • 25. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: algorithm Upsize: Weigh additional cost in energy of enabling another portion against potential savings by decreasing dynamic cost of accessing DRAM. ● Condition for upsize: MBW > (1.0 – Tu) * CB ● MBW – miss bandwidth: MiB/sec ● CB – cost bandwidth: MiB/sec ○ CB = L / ED ● L – static leakage of single portion: uJ/sec ● ED – dynamic energy of DRAM: uJ/MiB ● Tu – upsizing threshold: fraction 0.00 to 1.00 ○ Bias for performance ● Compare energy consumption ○ Bias for performance L3 cache static DRAM dynamic energy
  • 26. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: algorithm - 1 Downsize: From an energy trade-off perspective, to justify a portion to be powered on, requires a hit bandwidth that pays for its leakage. If that requirement is not met, it can be powered-off. ● Condition for downsize: HBW < (N – Td) * CB ● HBW – hit bandwidth: MiB/sec ● N – current number of portions enabled ● CB – cost bandwidth: MiB/sec ○ CB = L / ED ● L – static leakage of single portion: uJ/sec ● ED – dynamic energy of DRAM: uJ/MiB ● Td – downsize threshold: fraction 0.00 to 1.00 ○ Bias for performance ● Compare energy consumption ○ Bias for performance L3 cache static DRAM dynamic energy
  • 27. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: behaviour Example: ● 2MB L3 cache ● Memcpy workload with buffer size of 4MB
  • 28. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: behaviour - 1 Expected behaviour: ● CPU intensive workloads should not have an effect on the number of active portions ● I/O intensive loads should raise portions when the cache is well used
  • 29. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down ● Limitations of current reference implementation ○ Portion is the smallest single unit of the cache that can be powered up/down ○ Only support for a single DynamIQ Shared Unit ○ Not suitable for use with the simple on-demand governor ● L3 partial power-down in Arm Trusted Firmware? Reference: https://developer.arm.com/-/media/developer/developers/open- source/energy-aware-scheduling/DynamIQ_design_specification_v1.0.pdf
  • 30. Thank You #SFO17 BUD17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org