SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
XeMPUPiL
A performance-aware power capping
orchestrator for the Xen hypervisor
Marco Arnaboldi, author
marco1.arnaboldi@mail.polimi.it
05 June 2017
2
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
3
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
4
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
5
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
6
Problem Definition
One problem, two points of view:
➡ minimize power consumption given a minimum performance
requirement
➡ maximize performance given a maximum power consumption
capping
AVAILABLE 

POWER
7
Problem Definition
AVAILABLE 

POWER
One problem, two points of view:
➡ minimize power consumption given a minimum performance
requirement
➡ maximize performance given a maximum power consumption
capping
8
Challenges
A performance-aware power capping
orchestrator for the Xen hypervisor
9
A performance-aware power capping
orchestrator for the Xen hypervisor
Instrumentation-free
workload monitoring
Challenges
10
A performance-aware power capping
orchestrator for the Xen hypervisor
Instrumentation-free
workload monitoring
Power management techniques
HW vs. SW
Challenges
11
A performance-aware power capping
orchestrator for the Xen hypervisor
Instrumentation-free
workload monitoring
Open Source virtualization layer adopted
by many fortune companies
Power management techniques
HW vs. SW
Challenges
12
State of the Art
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
MODEL BASED

MONITORING [3]
THREAD

MIGRATION [2]
RESOURCE

MANAGMENT DVFS [4]
RAPL [1]
CPU

QUOTA
HARDWARE APPROACH
✖ efficiency
✓ timeliness
[1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010.
[2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011.
[3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171–
174. ACM, 2013.
[4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007. 







13
State of the Art
RESOURCE

MANAGMENT
CPU

QUOTA
HYBRID APPROACH [5]
✓ efficiency
✓ timeliness
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
HARDWARE APPROACH
✖ efficiency
✓ timeliness
[1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010.
[2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011.
[3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171–
174. ACM, 2013.
[4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007.
[5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS), 2016. 





MODEL BASED

MONITORING [3]
THREAD

MIGRATION [2]
DVFS [4]
RAPL [1]
14
Proposed Solution
Xen
Hypervisor
Hardware
DomainU
Workload
DomainU
Workload
Dom0
Decide
Observe
XeMPower
Hardware
Events
Counters
PUPiL
CLI
buffers
Hypercall manager
RAPL
interface
XL
Act
u Server setup (aka Sandy)
u 2.8-GHz quad-core Intel Xeon E5-1410 processor, no HT enabled (4 physical
core)
u 32GB of RAM
u Xen hypervisor version 4.4
u paravirtualized instance of Ubuntu 14.04 as Dom0, pinned on the first pCPU and
with 4GB of RAM
15
Experimental Setup
u Benchmarking
u Embarrassingly Parallel (EP) [1]
u IOzone [3]
u cachebench [2]
u Bi-Triagonal solver (BT) [1]
EP IOzone cachebench BT
CPU-bound YES NO NO YES
IO-bound NO YES NO YES
memory-bound NO NO YES YES
[1] Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb. html#url. Accessed: 2017-04-01.
[2] Openbenchmarking.org. https://openbenchmarking.org/test/pts/ cachebench. Accessed: 2017-04-01.
[3] Iozone filesystem benchmark. http://www.iozone.org. Accessed: 2017- 04-01.




16
Experimental Results
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
Baseline Definition via RAPL
17
Experimental Results
Baseline Definition via RAPL
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
CPU-intensive
benchmarks
suffer
processor
frequency
reduction
18
Experimental Results
Baseline Definition via RAPL
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
Other
benchmarks suffer
processor voltage
reduction
19
Experimental Results
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
XeMPUPiL
results
compared to the
baseline
20
Experimental Results
XeMPUPiL
results
compared to the
baseline
XeMPUPiL
outperforms pure
RAPL
for IO-, MEM-, and
mix-bound
benchmarks
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
21
Experimental Results
XeMPUPiL
results
compared to the
baseline XeMPUPiL suffers
pure CPU-bound
benchmarks, due
to Xen developer-
transparent
optimization
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
22
Conclusions
u Conclusions
u Performance tuning trough ODA controller under a power cap
improves performances
u Hybrid approaches like XeMPUPiL
u Better efficiency than HW approaches
u Better timeliness than SW approaches

“Towards a performance-aware power capping
orchestrator for the Xen hypervisor” @ EWiLi’16, October
6th, 2016, Pittsburgh, USA
Paper
23
Future Works
u (Integrating || Moving) orchestrator logic into
scheduler
u Exploit new RAPL version on Haswell family
u Explore new policies regarding:
u Decision
u Resource assignment
24
Thank you!!!
XeMPUPiL
“Towards a performance-aware
power capping orchestrator for the
Xen hypervisor” @ EWiLi’16,
October 6th, 2016, Pittsburgh, USA
25
ODA Details
ACTDECIDEOBSERVE
u Exploration in the space of
all possible resource
configuration, based on
binary search tree
u Policy in order to distribute
the virtual resources on the
physical ones.
u Enforce power cap via
RAPL
u Define a cpu pool for the
workload
u Launch the workload on the
pool
u Change the number of the
resource on the pool
accordingly with the
decision phase
u Pin workload’s vCPU over
pCPU accordingly with the
map decided
The decision phase is similar to the one implemented in
PUPiL. The major changes are in how we evaluate the metrics
gathered in the previous phase and in how we assign the
physical resources to each virtual domain.
The evaluation criterion is based on the average IR rate,
given a certain time window: this allows the workload to adapt
to the actual configuration before a new decision is taken.
For what concerns the allocation of resources to each
domains, we chose to work at a core-level granularity: on the
one hand, each domain owns a set virtual CPUs (vCPUs),
while, on the other hand, we have a set of physical CPUs
(pCPU) present on the machine. Each vCPU is mapped on a
pCPU for a certain amount of time, while it may happen that
even multiple vCPUs can be mapped on the same pCPU.
We wanted our allocation policy to be as fair as possible,
covering the whole set of pCPUs if possible; given a workload
with M virtual resources and an assignment of N physical
resources, to each pCPUi we assign:
vCPUs(i) =
2
6
6
6
6
6
M
X
0<j<i
vCPUs(j)
N i
3
7
7
7
7
7
(1)
where i is a number between 0 and N 1, i.e., it spans over
the set of pCPUs.
C. Act
The act phase essentially consists in: 1) setting the chosen
power cap and 2) actuating the selected resource configuration.
2Source code available at: https://bitbucket.org/necst/xempower
written to set a limit on the po
CPU socket.
In a virtualized environment,
accessible by the virtual doma
tenant Dom0. However, this li
invoking custom hypercalls th
derlying hardware. To the bes
hypervisor does not natively
interact with the RAPL inter
implemented our custom hype
der to be enough generic, we
"xempower_rdmsr" and "x
one allows to reads, while the
specified MSR from Dom0.
Each hypercall needs to be
the hypervisor, that runs bare
kernel keeps track of the list o
input parameters they accept.
function has to be declared and
by the kernel at runtime: our im
Xen build-in functions to safely
i.e., wrms_safe and rdmsr_
if something goes wrong in ac
critical problems to happen at
We then implemented
Interface (CLI) tools to ac
Dom0: xempower_RaplS
xempower_RaplPowerMoni
consumption of the socket.
value of power cap and the p
are passed through the whole
u Instruction retired per
domain metric
u Data gathered from
xempowermon
u Use of HPC and Xen
scheduler in order to map
the IR to the respective
domain
26
RAPL Details
MSR
INTEL RAPL INTERFACE
HYPERCALL MANAGER
BUFFER
XEMPOWER
CLI TOOL
u Two tools based on xc native tool: XEMPOWER_RAPLSETPOWER and
XEMPOWER_RAPLPOWERMONITOR
u Tools divided into two parts
u FRONTEND: manage users command and gather information ad
privileges about the session. Pass the user parameters to the backend
u BACKEND: bake the hyperbola, declaring an hypercall structure and
filling it with the user parameters. The invoke the just defined hypercall
u Used to map user space memory to kernel memory, in order to perform
“pass by reference like” mechanism inside hyperbola
u Declaration of two custom hypercall: XEMPOWER_RDMSR and
XEMPOWER_WRMSR
u Implementation of the routines that will manage the two custom hyperbolas
u Accessed by the routines, that write to and read from RAPL specific MSR
register, in order to set the power cap and to retrive metrics on the socket
power consumption
u Three registers are accessed:
u RAPL_PWR_INFO
u RAPL_PK_POWER_LIMIT
u RAPL_PK_POWER_INFO

Contenu connexe

Tendances

A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
A Framework for Protecting Worker Location Privacy in Spatial CrowdsourcingA Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
University of Southern California
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Fisnik Kraja
 

Tendances (20)

Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Implementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopImplementation of k means algorithm on Hadoop
Implementation of k means algorithm on Hadoop
 
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
 
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open CloudCoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
A Framework for Protecting Worker Location Privacy in Spatial CrowdsourcingA Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 

Similaire à XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hypervisor

Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Using
jorgerodriguessimao
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
 

Similaire à XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hypervisor (20)

XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
Sonaiya software Solutions
Sonaiya software SolutionsSonaiya software Solutions
Sonaiya software Solutions
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 
Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Using
 
JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...
JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...
JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...
 
Harvesting aware energy management for time-critical wireless sensor networks
Harvesting aware energy management for time-critical wireless sensor networksHarvesting aware energy management for time-critical wireless sensor networks
Harvesting aware energy management for time-critical wireless sensor networks
 
Distributed storage performance for OpenStack clouds using small-file IO work...
Distributed storage performance for OpenStack clouds using small-file IO work...Distributed storage performance for OpenStack clouds using small-file IO work...
Distributed storage performance for OpenStack clouds using small-file IO work...
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
XeMPUPiL @ NGCLE@e-Novia 15.11.2017
XeMPUPiL @ NGCLE@e-Novia 15.11.2017XeMPUPiL @ NGCLE@e-Novia 15.11.2017
XeMPUPiL @ NGCLE@e-Novia 15.11.2017
 
HPC Resource Accounting
HPC Resource AccountingHPC Resource Accounting
HPC Resource Accounting
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 

Plus de NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
NECST Lab @ Politecnico di Milano
 

Plus de NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Dernier

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Dernier (20)

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 

XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hypervisor

  • 1. XeMPUPiL A performance-aware power capping orchestrator for the Xen hypervisor Marco Arnaboldi, author marco1.arnaboldi@mail.polimi.it 05 June 2017
  • 6. 6 Problem Definition One problem, two points of view: ➡ minimize power consumption given a minimum performance requirement ➡ maximize performance given a maximum power consumption capping AVAILABLE 
 POWER
  • 7. 7 Problem Definition AVAILABLE 
 POWER One problem, two points of view: ➡ minimize power consumption given a minimum performance requirement ➡ maximize performance given a maximum power consumption capping
  • 8. 8 Challenges A performance-aware power capping orchestrator for the Xen hypervisor
  • 9. 9 A performance-aware power capping orchestrator for the Xen hypervisor Instrumentation-free workload monitoring Challenges
  • 10. 10 A performance-aware power capping orchestrator for the Xen hypervisor Instrumentation-free workload monitoring Power management techniques HW vs. SW Challenges
  • 11. 11 A performance-aware power capping orchestrator for the Xen hypervisor Instrumentation-free workload monitoring Open Source virtualization layer adopted by many fortune companies Power management techniques HW vs. SW Challenges
  • 12. 12 State of the Art SOFTWARE APPROACH ✓ efficiency ✖ timeliness MODEL BASED
 MONITORING [3] THREAD
 MIGRATION [2] RESOURCE MANAGMENT DVFS [4] RAPL [1] CPU QUOTA HARDWARE APPROACH ✖ efficiency ✓ timeliness [1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010. [2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011. [3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171– 174. ACM, 2013. [4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007. 
 
 
 

  • 13. 13 State of the Art RESOURCE MANAGMENT CPU QUOTA HYBRID APPROACH [5] ✓ efficiency ✓ timeliness SOFTWARE APPROACH ✓ efficiency ✖ timeliness HARDWARE APPROACH ✖ efficiency ✓ timeliness [1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010. [2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011. [3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171– 174. ACM, 2013. [4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007. [5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016. 
 
 
 MODEL BASED
 MONITORING [3] THREAD
 MIGRATION [2] DVFS [4] RAPL [1]
  • 15. u Server setup (aka Sandy) u 2.8-GHz quad-core Intel Xeon E5-1410 processor, no HT enabled (4 physical core) u 32GB of RAM u Xen hypervisor version 4.4 u paravirtualized instance of Ubuntu 14.04 as Dom0, pinned on the first pCPU and with 4GB of RAM 15 Experimental Setup u Benchmarking u Embarrassingly Parallel (EP) [1] u IOzone [3] u cachebench [2] u Bi-Triagonal solver (BT) [1] EP IOzone cachebench BT CPU-bound YES NO NO YES IO-bound NO YES NO YES memory-bound NO NO YES YES [1] Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb. html#url. Accessed: 2017-04-01. [2] Openbenchmarking.org. https://openbenchmarking.org/test/pts/ cachebench. Accessed: 2017-04-01. [3] Iozone filesystem benchmark. http://www.iozone.org. Accessed: 2017- 04-01. 
 

  • 16. 16 Experimental Results 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT Baseline Definition via RAPL
  • 17. 17 Experimental Results Baseline Definition via RAPL 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT CPU-intensive benchmarks suffer processor frequency reduction
  • 18. 18 Experimental Results Baseline Definition via RAPL 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT Other benchmarks suffer processor voltage reduction
  • 19. 19 Experimental Results 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT XeMPUPiL results compared to the baseline
  • 20. 20 Experimental Results XeMPUPiL results compared to the baseline XeMPUPiL outperforms pure RAPL for IO-, MEM-, and mix-bound benchmarks 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT
  • 21. 21 Experimental Results XeMPUPiL results compared to the baseline XeMPUPiL suffers pure CPU-bound benchmarks, due to Xen developer- transparent optimization 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT
  • 22. 22 Conclusions u Conclusions u Performance tuning trough ODA controller under a power cap improves performances u Hybrid approaches like XeMPUPiL u Better efficiency than HW approaches u Better timeliness than SW approaches
 “Towards a performance-aware power capping orchestrator for the Xen hypervisor” @ EWiLi’16, October 6th, 2016, Pittsburgh, USA Paper
  • 23. 23 Future Works u (Integrating || Moving) orchestrator logic into scheduler u Exploit new RAPL version on Haswell family u Explore new policies regarding: u Decision u Resource assignment
  • 24. 24 Thank you!!! XeMPUPiL “Towards a performance-aware power capping orchestrator for the Xen hypervisor” @ EWiLi’16, October 6th, 2016, Pittsburgh, USA
  • 25. 25 ODA Details ACTDECIDEOBSERVE u Exploration in the space of all possible resource configuration, based on binary search tree u Policy in order to distribute the virtual resources on the physical ones. u Enforce power cap via RAPL u Define a cpu pool for the workload u Launch the workload on the pool u Change the number of the resource on the pool accordingly with the decision phase u Pin workload’s vCPU over pCPU accordingly with the map decided The decision phase is similar to the one implemented in PUPiL. The major changes are in how we evaluate the metrics gathered in the previous phase and in how we assign the physical resources to each virtual domain. The evaluation criterion is based on the average IR rate, given a certain time window: this allows the workload to adapt to the actual configuration before a new decision is taken. For what concerns the allocation of resources to each domains, we chose to work at a core-level granularity: on the one hand, each domain owns a set virtual CPUs (vCPUs), while, on the other hand, we have a set of physical CPUs (pCPU) present on the machine. Each vCPU is mapped on a pCPU for a certain amount of time, while it may happen that even multiple vCPUs can be mapped on the same pCPU. We wanted our allocation policy to be as fair as possible, covering the whole set of pCPUs if possible; given a workload with M virtual resources and an assignment of N physical resources, to each pCPUi we assign: vCPUs(i) = 2 6 6 6 6 6 M X 0<j<i vCPUs(j) N i 3 7 7 7 7 7 (1) where i is a number between 0 and N 1, i.e., it spans over the set of pCPUs. C. Act The act phase essentially consists in: 1) setting the chosen power cap and 2) actuating the selected resource configuration. 2Source code available at: https://bitbucket.org/necst/xempower written to set a limit on the po CPU socket. In a virtualized environment, accessible by the virtual doma tenant Dom0. However, this li invoking custom hypercalls th derlying hardware. To the bes hypervisor does not natively interact with the RAPL inter implemented our custom hype der to be enough generic, we "xempower_rdmsr" and "x one allows to reads, while the specified MSR from Dom0. Each hypercall needs to be the hypervisor, that runs bare kernel keeps track of the list o input parameters they accept. function has to be declared and by the kernel at runtime: our im Xen build-in functions to safely i.e., wrms_safe and rdmsr_ if something goes wrong in ac critical problems to happen at We then implemented Interface (CLI) tools to ac Dom0: xempower_RaplS xempower_RaplPowerMoni consumption of the socket. value of power cap and the p are passed through the whole u Instruction retired per domain metric u Data gathered from xempowermon u Use of HPC and Xen scheduler in order to map the IR to the respective domain
  • 26. 26 RAPL Details MSR INTEL RAPL INTERFACE HYPERCALL MANAGER BUFFER XEMPOWER CLI TOOL u Two tools based on xc native tool: XEMPOWER_RAPLSETPOWER and XEMPOWER_RAPLPOWERMONITOR u Tools divided into two parts u FRONTEND: manage users command and gather information ad privileges about the session. Pass the user parameters to the backend u BACKEND: bake the hyperbola, declaring an hypercall structure and filling it with the user parameters. The invoke the just defined hypercall u Used to map user space memory to kernel memory, in order to perform “pass by reference like” mechanism inside hyperbola u Declaration of two custom hypercall: XEMPOWER_RDMSR and XEMPOWER_WRMSR u Implementation of the routines that will manage the two custom hyperbolas u Accessed by the routines, that write to and read from RAPL specific MSR register, in order to set the power cap and to retrive metrics on the socket power consumption u Three registers are accessed: u RAPL_PWR_INFO u RAPL_PK_POWER_LIMIT u RAPL_PK_POWER_INFO