SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Towards a performance-aware power capping
orchestrator for the Xen hypervisor
Marco Arnaboldi, Matteo Ferroni, Marco D. Santambrogio
EWiLi’16, 10/06/2016, Pittsburgh PA, USA.
Co-located with the Embedded Systems Week.
Outline 2
• Introduction and system requirements
• Problem definition and proposed solution
• Related work
• XeMPUPiL Goals
• System design and implementation
• Experimental evaluation and results
• Conclusion and future work
Introduction
• Computing systems changed considerably in the
last few decades
– multi-core processors entered into the domain
of embedded systems
• Wide range of application fields
– automotive, Internet TV, mobile, …
– other embedded use cases like low-power
microservers for lightweight scale-out
workloads
• Fog Computing takes the computation “at the edge
of the Cloud” by exploiting fog nodes
– use case: latency-sensitive and security-critical
applications
3
Servers
Fog
IoT
Requirement 1: Portability 4
Servers
Fog
IoT
• Applications needs to be PORTABLE
between the Cloud and the Fog
– Hardware-assisted and software
virtualization enter the context of
embedded systems
• Features:
– applications do not need to be changed
– physical resources shared between
applications
– strong security and isolation guarantees
Requirement 2: Power consumption 5
• Nodes may be POWER CONSTRAINED
– Power management techniques to
control power consumption
• Limit power consumption of a machine to a
fixed “cap”, with the following features:
– timeliness: the ability of the system in
enforcing a new cap rapidly
– efficiency: maximize the performance
delivered by the applications under a
fixed power cap
Problem definition and Proposed solution
• One problem, two points of view:
– minimize power consumption given a minimum
performance requirement
– maximize performance given a limit on the maximum
power consumption
• Proposed solution:
– XeMPUPiL, a performance-aware power capping
orchestrator for the Xen hypervisor
6
Power capping approaches 7
Hardware Power Capping
(i.e. Intel RAPL[1])
Software-level resource
management
Description
Exploits DVFS to control
power consumption
Resource management to
achieve the desired power
consumption
PRO
Very fast
(~350ms [1])
It’s possible to tune
performances of
applications
CONS
No control over
performances of
applications
Slow compared to RAPL
(double digit degradation)
Power capping approaches 8
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
MODEL BASED

MONITORING [3]
THREAD

MIGRATION [2]
RESOURCE
MANAGMENT DVFS [4] RAPL [1]
CPU
QUOTA
HARDWARE APPROACH
✖ efficiency
✓ timeliness
[1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010.
[2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011.
[3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and
ubiquitous computing adjunct publication, pages 171–174. ACM, 2013.
[4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007.
Power capping approaches 9
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
MODEL BASED

MONITORING [3]
THREAD

MIGRATION [2] RESOURCE
MANAGMENT
DVFS [4]
RAPL [1]
CPU
QUOTA
HARDWARE APPROACH
✖ efficiency
✓ timeliness
[1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010.
[2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011.
[3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and
ubiquitous computing adjunct publication, pages 171–174. ACM, 2013.
[4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007.
HYBRID APPROACH
✓ efficiency
✓ timeliness
Related work: PUPiL [5] 10
[5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS), 2016.
• PUPiL, a framework that aims to minimize and to maximize respectively
the concept of timeliness and efficiency
• Proposed approach:
– both hardware (i.e., the Intel RAPL interface [10]) and software (i.e.,
resource partitioning and allocation) techniques
– exploits a canonical ODA control loop, one of the main building blocks of
self-aware computing
• Limitations
– the applications running on the system need to be instrumented with the
Heartbeat framework, to provide uniform metric of throughput
– applications running bare-metal on Linux
• These conditions might not hold in the context of a multi-tenant
virtualized environment
Goals
• We want to extend this approach to:
– work in a virtualized environment, based on the Xen
hypervisor
– avoid instrumentation of the guest workloads, as each
tenant is seen as a “black box”
• We then need to:
1. identify a performance metric for all the hosted tenants
2. improve the decision phase, to deal with the requirements
of a virtualized environment
3. extend the hypervisor to provide the right knobs to work
with our orchestrating logic
11
The Xen Hypervisor 12
Slides from: http://www.slideshare.net/xen_com_mgr/xpds16-porting-xen-on-arm-to-a-new-soc-julien-grall-arm
1. Performance metric identification
• Hardware event counters as low level metrics of
performance
• We exploit the Intel Performance Monitoring Unit (PMU)
to monitor the number of Instruction Retired (IR)
accounted to each domain in a certain time window
– an insight on how many microinstructions were completely
executed (i.e., that successfully reached the end of the
pipeline)
– it represents a reasonable indicator of performance, as the
same manufacturer suggests [6]
13
[6] Clockticks per instructions retired (cpi). https://software.intel.com/en-us/node/544403. Accessed: 2016-06-01.
1. Performance monitoring 14
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
XEMPOWER
Collect and account hardware events
to virtual tenants in two steps:
1. In the Xen scheduler (kernel-level)
• At every context switch, trace the
interesting hardware events
• e.g., INST_RET
Tracing the Domains’ behavior
1. Performance monitoring 14
XeMPowerCLI
A1
1
B1
A2
2
B2
A1
1
B1
A3
3
B3
A2
2
1
A1
Core 0 Core N
Time
B2
…
…
…
context
switch
context
switch
context
switch
context
switch
XeMPowerDaemon
B2
B2
B1
B1
B3
B2
B2
B1
B1
B3
Xen Kernel Dom0
Hardware events per core,
energy per socket
…
XEMPOWER
Collect and account hardware events
to virtual tenants in two steps:
1. In the Xen scheduler (kernel-level)
• At every context switch, trace the
interesting hardware events
• e.g., INST_RET
2. In Domain 0 (privileged tenant)
• Periodically acquire the events
traces and aggregate them on a
domain basis
Tracing the Domains’ behavior
2. Decision phase and virtualization
• Evaluation criterion: the average IR rate over a certain time
window
– the time window allows the workload to adapt to the actual
configuration
– the comparison of IR rates of different configurations highlights
which one makes the workload perform better
• Resource allocation granularity: core-level
– each domain owns a set virtual CPUs (vCPUs)
– a set of physical CPUs (pCPU) present on the machine
– each vCPU can be mapped on a pCPU for a certain amount of
time, while multiple vCPUs can be mapped on the same pCPU
• We wanted our allocation to cover the whole set of pCPUs, if
possible
15
3. Extending the hypervisor - RAPL
• Working with the Intel RAPL interface:
– harshly cutting the frequency and the voltage of the whole CPU socket
• On a bare-metal operating system:
– reading and writing data into the right Model Specific Register (MSR)
• MSR_RAPL_POWER_UNIT: read processor-specific time, energy and power
units, used to scale each value read or written
• MSR_PKG_RAPL_POWER_LIMIT: write to set a limit on the power
consumption of the whole socket
• In a virtualized environment:
– the Xen hypervisor does not natively support the RAPL interface
– we developed custom hypercalls, with kernel callback functions and
memory buffers
– we developed a CLI tool that performs some checks on the input
parameters, as well as of instantiating and invoking the Xen command
interface to launch the hypercalls
16
3. Extending the hypervisor - Resources
• cpupool tool:
– allows to cluster the physical CPUs in different pools
– the pool scheduler will schedule the domain’s vCPUs only
on the pCPUs that are part of that cluster
– as a new resource allocation is chosen by the decide phase,
we increase or decrease the number of pCPUs in the pool
– pin the domain’s vCPUs to these, to increase workload
stability
• NO xenpm:
– set a maximum and minimum frequency for each pCPU
– it may interfere with the actuation made by RAPL
17
System Design 19
System Design
• The workloads run in paravirtualized domains
20
System Design
• XeMPUPiL spans over all the layers
21
System Design
• Instruction Retired (IR) metric gathered and accounted to each domain,
thanks to XeMPower
• The aggregation is done over a time window of 1 second
22
System Design
• Observation of both hardware events (i.e., IR) and power
consumption (whole CPU socket)
23
System Design 24
– given a workload with M virtual resources
and an assignment of N physical resources,
to each pCPUi we assign:
System Design
• Hybrid actuation:
– enforce power cap via RAPL
– define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
25
System Design 26
• Hybrid actuation:
– enforce power cap via RAPL
– define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
System Design 27
• Hybrid actuation:
– enforce power cap via RAPL
– define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
Experimental Setup
• Server setup (aka Sandy)
– 2.8-GHz quad-core Intel Xeon E5-1410 processor, no HT enabled
(4 cores)
– 32GB of RAM
– Xen hypervisor version 4.4
– paravirtualized instance of Ubuntu 14.04 as Dom0, pinned on the
first 4 and with 4GB of RAM
• Benchmarking
– Embarrassingly Parallel (EP) [1]
– IOzone [3]
– cachebench [2]
– Bi-Triagonal solver (BT) [1]
28
EP IOzone cachebench BT
CPU-bound YES NO NO YES
IO-bound NO YES NO YES
memory-bound NO NO YES YES[1] Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb. html#url. Accessed: 2016-06-01.
[2] Openbenchmarking.org. https://openbenchmarking.org/test/pts/ cachebench. Accessed: 2016-06-01.
[3] Iozone filesystem benchmark. http://www.iozone.org. Accessed: 2016- 06-01.


Experimental evaluation 29
• Experimental evaluation:
1. how do different workloads perform under a power cap?
2. can we achieve higher efficiency w.r.t. RAPL power cap?
• Three power caps explored: 40W, 30W and 20W
– in idle state, the entire socket consumes around 17W
– the maximum power consumption we measured was
around 43W
• Results are normalized with respect to the performance
obtained with no power caps
Experimental Results 31
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20
NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
• Preliminary evaluation: how do they perform under a power cap?
• For CPU-bound benchmarks (i.e., EP and BT), the difference are
significant w.r.t. benchmarks where the bottleneck is on the IO and/
or on memory accesses
Experimental Results 32
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20
NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
• Preliminary evaluation: how do they perform under a power cap?
• With IO- and/or memory-bound workloads, the performance
degradation is less significant between different power caps
Experimental Results 34
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
• Performance of the
workloads with
XeMPUPiL, for different
power caps:
– higher performance
than RAPL, in general
– not always true on a
pure CPU-bound
benchmark (i.e., EP)
Experimental Results 35
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
• Performance of the
workloads with
XeMPUPiL, for different
power caps:
– higher performance
than RAPL, in general
– not always true on a
pure CPU-bound
benchmark (i.e., EP)
Experimental Results 36
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
• XeMPUPiL improves the
performance of the IO-bound,
the memory-bound and the
mixed benchmark w.r.t. the
system with no constraints:
– just one core assigned for
IOzone and Cachebench
– two cores for the BT
benchmark
• These allocations are more
power efficient, as they
reduce memory and IO
contention for non strictly
CPU-bound workloads
Conclusion and Future Work
• Conclusions
– Performance tuning trough ODA controller under a power cap
improves performance
• Future works
– Improving decide phase
• Better algorithm in order to reduce convergence time
• More general approach in order to improve portability
– Improving act phase
• Implementation of custom fine-grained tool for
resource management in Xen
37
Questions? 38

Contenu connexe

Tendances

Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...eSAT Journals
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stackinside-BigData.com
 
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...Matteo Ferroni
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISAPatrick Bellasi
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnectinside-BigData.com
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhereinside-BigData.com
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimizationinside-BigData.com
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Pradeep Singh
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionShobha Kumar
 
Improving Quality of Service via Intel RDT
Improving Quality of Service via Intel RDTImproving Quality of Service via Intel RDT
Improving Quality of Service via Intel RDTLiz Warner
 

Tendances (20)

Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...
 
SmartBalance-DAC-v2
SmartBalance-DAC-v2SmartBalance-DAC-v2
SmartBalance-DAC-v2
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISA
 
openCL Paper
openCL PaperopenCL Paper
openCL Paper
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
Pipeline parallelism
Pipeline parallelismPipeline parallelism
Pipeline parallelism
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimization
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
 
Improving Quality of Service via Intel RDT
Improving Quality of Service via Intel RDTImproving Quality of Service via Intel RDT
Improving Quality of Service via Intel RDT
 

Similaire à [EWiLi2016] Towards a performance-aware power capping orchestrator for the Xen hypervisor

XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...NECST Lab @ Politecnico di Milano
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorNECST Lab @ Politecnico di Milano
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...CSCJournals
 
Understanding the characteristics of android wear os
Understanding the characteristics of android wear osUnderstanding the characteristics of android wear os
Understanding the characteristics of android wear osPratik Jain
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdf37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdfTB107thippeswamyM
 
Adaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog PotdarAdaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog PotdarSuyog Potdar
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsNational Cheng Kung University
 
asap2013-khoa-presentation
asap2013-khoa-presentationasap2013-khoa-presentation
asap2013-khoa-presentationAbhishek Jain
 
Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0Michael Christofferson
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 

Similaire à [EWiLi2016] Towards a performance-aware power capping orchestrator for the Xen hypervisor (20)

XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisor
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
 
Understanding the characteristics of android wear os
Understanding the characteristics of android wear osUnderstanding the characteristics of android wear os
Understanding the characteristics of android wear os
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Hardware-Software Codesign
Hardware-Software CodesignHardware-Software Codesign
Hardware-Software Codesign
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
 
37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdf37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdf
 
HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Adaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog PotdarAdaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog Potdar
 
50120140505008
5012014050500850120140505008
50120140505008
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
 
CaseStudies
CaseStudiesCaseStudies
CaseStudies
 
asap2013-khoa-presentation
asap2013-khoa-presentationasap2013-khoa-presentation
asap2013-khoa-presentation
 
Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0Four Ways to Improve Linux Performance IEEE Webinar, R2.0
Four Ways to Improve Linux Performance IEEE Webinar, R2.0
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 

Plus de Matteo Ferroni

Fight data gravity with event-driven architectures
Fight data gravity with event-driven architecturesFight data gravity with event-driven architectures
Fight data gravity with event-driven architecturesMatteo Ferroni
 
[Droidcon Italy 2017] Client and server, 3 meters above the cloud
[Droidcon Italy 2017] Client and server, 3 meters above the cloud[Droidcon Italy 2017] Client and server, 3 meters above the cloud
[Droidcon Italy 2017] Client and server, 3 meters above the cloudMatteo Ferroni
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...Matteo Ferroni
 
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...Matteo Ferroni
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
 
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...Matteo Ferroni
 

Plus de Matteo Ferroni (6)

Fight data gravity with event-driven architectures
Fight data gravity with event-driven architecturesFight data gravity with event-driven architectures
Fight data gravity with event-driven architectures
 
[Droidcon Italy 2017] Client and server, 3 meters above the cloud
[Droidcon Italy 2017] Client and server, 3 meters above the cloud[Droidcon Italy 2017] Client and server, 3 meters above the cloud
[Droidcon Italy 2017] Client and server, 3 meters above the cloud
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
 
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
 
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
 

Dernier

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Dernier (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xen hypervisor

  • 1. Towards a performance-aware power capping orchestrator for the Xen hypervisor Marco Arnaboldi, Matteo Ferroni, Marco D. Santambrogio EWiLi’16, 10/06/2016, Pittsburgh PA, USA. Co-located with the Embedded Systems Week.
  • 2. Outline 2 • Introduction and system requirements • Problem definition and proposed solution • Related work • XeMPUPiL Goals • System design and implementation • Experimental evaluation and results • Conclusion and future work
  • 3. Introduction • Computing systems changed considerably in the last few decades – multi-core processors entered into the domain of embedded systems • Wide range of application fields – automotive, Internet TV, mobile, … – other embedded use cases like low-power microservers for lightweight scale-out workloads • Fog Computing takes the computation “at the edge of the Cloud” by exploiting fog nodes – use case: latency-sensitive and security-critical applications 3 Servers Fog IoT
  • 4. Requirement 1: Portability 4 Servers Fog IoT • Applications needs to be PORTABLE between the Cloud and the Fog – Hardware-assisted and software virtualization enter the context of embedded systems • Features: – applications do not need to be changed – physical resources shared between applications – strong security and isolation guarantees
  • 5. Requirement 2: Power consumption 5 • Nodes may be POWER CONSTRAINED – Power management techniques to control power consumption • Limit power consumption of a machine to a fixed “cap”, with the following features: – timeliness: the ability of the system in enforcing a new cap rapidly – efficiency: maximize the performance delivered by the applications under a fixed power cap
  • 6. Problem definition and Proposed solution • One problem, two points of view: – minimize power consumption given a minimum performance requirement – maximize performance given a limit on the maximum power consumption • Proposed solution: – XeMPUPiL, a performance-aware power capping orchestrator for the Xen hypervisor 6
  • 7. Power capping approaches 7 Hardware Power Capping (i.e. Intel RAPL[1]) Software-level resource management Description Exploits DVFS to control power consumption Resource management to achieve the desired power consumption PRO Very fast (~350ms [1]) It’s possible to tune performances of applications CONS No control over performances of applications Slow compared to RAPL (double digit degradation)
  • 8. Power capping approaches 8 SOFTWARE APPROACH ✓ efficiency ✖ timeliness MODEL BASED
 MONITORING [3] THREAD
 MIGRATION [2] RESOURCE MANAGMENT DVFS [4] RAPL [1] CPU QUOTA HARDWARE APPROACH ✖ efficiency ✓ timeliness [1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010. [2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011. [3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171–174. ACM, 2013. [4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007.
  • 9. Power capping approaches 9 SOFTWARE APPROACH ✓ efficiency ✖ timeliness MODEL BASED
 MONITORING [3] THREAD
 MIGRATION [2] RESOURCE MANAGMENT DVFS [4] RAPL [1] CPU QUOTA HARDWARE APPROACH ✖ efficiency ✓ timeliness [1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010. [2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011. [3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171–174. ACM, 2013. [4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007. HYBRID APPROACH ✓ efficiency ✓ timeliness
  • 10. Related work: PUPiL [5] 10 [5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016. • PUPiL, a framework that aims to minimize and to maximize respectively the concept of timeliness and efficiency • Proposed approach: – both hardware (i.e., the Intel RAPL interface [10]) and software (i.e., resource partitioning and allocation) techniques – exploits a canonical ODA control loop, one of the main building blocks of self-aware computing • Limitations – the applications running on the system need to be instrumented with the Heartbeat framework, to provide uniform metric of throughput – applications running bare-metal on Linux • These conditions might not hold in the context of a multi-tenant virtualized environment
  • 11. Goals • We want to extend this approach to: – work in a virtualized environment, based on the Xen hypervisor – avoid instrumentation of the guest workloads, as each tenant is seen as a “black box” • We then need to: 1. identify a performance metric for all the hosted tenants 2. improve the decision phase, to deal with the requirements of a virtualized environment 3. extend the hypervisor to provide the right knobs to work with our orchestrating logic 11
  • 12. The Xen Hypervisor 12 Slides from: http://www.slideshare.net/xen_com_mgr/xpds16-porting-xen-on-arm-to-a-new-soc-julien-grall-arm
  • 13. 1. Performance metric identification • Hardware event counters as low level metrics of performance • We exploit the Intel Performance Monitoring Unit (PMU) to monitor the number of Instruction Retired (IR) accounted to each domain in a certain time window – an insight on how many microinstructions were completely executed (i.e., that successfully reached the end of the pipeline) – it represents a reasonable indicator of performance, as the same manufacturer suggests [6] 13 [6] Clockticks per instructions retired (cpi). https://software.intel.com/en-us/node/544403. Accessed: 2016-06-01.
  • 14. 1. Performance monitoring 14 XeMPowerCLI A1 1 B1 A2 2 B2 A1 1 B1 A3 3 B3 A2 2 1 A1 Core 0 Core N Time B2 … … … context switch context switch context switch context switch XeMPowerDaemon B2 B2 B1 B1 B3 B2 B2 B1 B1 B3 Xen Kernel Dom0 Hardware events per core, energy per socket … XEMPOWER Collect and account hardware events to virtual tenants in two steps: 1. In the Xen scheduler (kernel-level) • At every context switch, trace the interesting hardware events • e.g., INST_RET Tracing the Domains’ behavior
  • 15. 1. Performance monitoring 14 XeMPowerCLI A1 1 B1 A2 2 B2 A1 1 B1 A3 3 B3 A2 2 1 A1 Core 0 Core N Time B2 … … … context switch context switch context switch context switch XeMPowerDaemon B2 B2 B1 B1 B3 B2 B2 B1 B1 B3 Xen Kernel Dom0 Hardware events per core, energy per socket … XEMPOWER Collect and account hardware events to virtual tenants in two steps: 1. In the Xen scheduler (kernel-level) • At every context switch, trace the interesting hardware events • e.g., INST_RET 2. In Domain 0 (privileged tenant) • Periodically acquire the events traces and aggregate them on a domain basis Tracing the Domains’ behavior
  • 16. 2. Decision phase and virtualization • Evaluation criterion: the average IR rate over a certain time window – the time window allows the workload to adapt to the actual configuration – the comparison of IR rates of different configurations highlights which one makes the workload perform better • Resource allocation granularity: core-level – each domain owns a set virtual CPUs (vCPUs) – a set of physical CPUs (pCPU) present on the machine – each vCPU can be mapped on a pCPU for a certain amount of time, while multiple vCPUs can be mapped on the same pCPU • We wanted our allocation to cover the whole set of pCPUs, if possible 15
  • 17. 3. Extending the hypervisor - RAPL • Working with the Intel RAPL interface: – harshly cutting the frequency and the voltage of the whole CPU socket • On a bare-metal operating system: – reading and writing data into the right Model Specific Register (MSR) • MSR_RAPL_POWER_UNIT: read processor-specific time, energy and power units, used to scale each value read or written • MSR_PKG_RAPL_POWER_LIMIT: write to set a limit on the power consumption of the whole socket • In a virtualized environment: – the Xen hypervisor does not natively support the RAPL interface – we developed custom hypercalls, with kernel callback functions and memory buffers – we developed a CLI tool that performs some checks on the input parameters, as well as of instantiating and invoking the Xen command interface to launch the hypercalls 16
  • 18. 3. Extending the hypervisor - Resources • cpupool tool: – allows to cluster the physical CPUs in different pools – the pool scheduler will schedule the domain’s vCPUs only on the pCPUs that are part of that cluster – as a new resource allocation is chosen by the decide phase, we increase or decrease the number of pCPUs in the pool – pin the domain’s vCPUs to these, to increase workload stability • NO xenpm: – set a maximum and minimum frequency for each pCPU – it may interfere with the actuation made by RAPL 17
  • 20. System Design • The workloads run in paravirtualized domains 20
  • 21. System Design • XeMPUPiL spans over all the layers 21
  • 22. System Design • Instruction Retired (IR) metric gathered and accounted to each domain, thanks to XeMPower • The aggregation is done over a time window of 1 second 22
  • 23. System Design • Observation of both hardware events (i.e., IR) and power consumption (whole CPU socket) 23
  • 24. System Design 24 – given a workload with M virtual resources and an assignment of N physical resources, to each pCPUi we assign:
  • 25. System Design • Hybrid actuation: – enforce power cap via RAPL – define a CPU pool for the workload and pin workload’s vCPUs over pCPUs 25
  • 26. System Design 26 • Hybrid actuation: – enforce power cap via RAPL – define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
  • 27. System Design 27 • Hybrid actuation: – enforce power cap via RAPL – define a CPU pool for the workload and pin workload’s vCPUs over pCPUs
  • 28. Experimental Setup • Server setup (aka Sandy) – 2.8-GHz quad-core Intel Xeon E5-1410 processor, no HT enabled (4 cores) – 32GB of RAM – Xen hypervisor version 4.4 – paravirtualized instance of Ubuntu 14.04 as Dom0, pinned on the first 4 and with 4GB of RAM • Benchmarking – Embarrassingly Parallel (EP) [1] – IOzone [3] – cachebench [2] – Bi-Triagonal solver (BT) [1] 28 EP IOzone cachebench BT CPU-bound YES NO NO YES IO-bound NO YES NO YES memory-bound NO NO YES YES[1] Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb. html#url. Accessed: 2016-06-01. [2] Openbenchmarking.org. https://openbenchmarking.org/test/pts/ cachebench. Accessed: 2016-06-01. [3] Iozone filesystem benchmark. http://www.iozone.org. Accessed: 2016- 06-01. 

  • 29. Experimental evaluation 29 • Experimental evaluation: 1. how do different workloads perform under a power cap? 2. can we achieve higher efficiency w.r.t. RAPL power cap? • Three power caps explored: 40W, 30W and 20W – in idle state, the entire socket consumes around 17W – the maximum power consumption we measured was around 43W • Results are normalized with respect to the performance obtained with no power caps
  • 30. Experimental Results 31 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20 NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT • Preliminary evaluation: how do they perform under a power cap? • For CPU-bound benchmarks (i.e., EP and BT), the difference are significant w.r.t. benchmarks where the bottleneck is on the IO and/ or on memory accesses
  • 31. Experimental Results 32 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20 NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT • Preliminary evaluation: how do they perform under a power cap? • With IO- and/or memory-bound workloads, the performance degradation is less significant between different power caps
  • 32. Experimental Results 34 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT • Performance of the workloads with XeMPUPiL, for different power caps: – higher performance than RAPL, in general – not always true on a pure CPU-bound benchmark (i.e., EP)
  • 33. Experimental Results 35 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT • Performance of the workloads with XeMPUPiL, for different power caps: – higher performance than RAPL, in general – not always true on a pure CPU-bound benchmark (i.e., EP)
  • 34. Experimental Results 36 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT • XeMPUPiL improves the performance of the IO-bound, the memory-bound and the mixed benchmark w.r.t. the system with no constraints: – just one core assigned for IOzone and Cachebench – two cores for the BT benchmark • These allocations are more power efficient, as they reduce memory and IO contention for non strictly CPU-bound workloads
  • 35. Conclusion and Future Work • Conclusions – Performance tuning trough ODA controller under a power cap improves performance • Future works – Improving decide phase • Better algorithm in order to reduce convergence time • More general approach in order to improve portability – Improving act phase • Implementation of custom fine-grained tool for resource management in Xen 37