Higher SLA Satisfaction in Datacenters 
with Continuous Placement Constraints 
Huynh Tu Dang 
hdang@polytech.unice.fr 
Fab...
SLA for a virtualised application
SLA for a virtualised application 
spread the replicas
SLA for a virtualised application 
performance guarantee
SLA for a virtualised application 
low latency
reconfiguration algorithm 
N2 
N3 
time 
SLA: spread(VM1, VM2) 
VM2 
VM1 
N1
reconfiguration algorithm 
N2 
N3 
time 
SLA: spread(VM1, VM2) 
sys-admin query: offline(N1) 
VM2 
VM2 
VM1 
VM1 
N1 
Reco...
reconfiguration algorithm 
N2 
N3 
time 
SLA: spread(VM1, VM2) 
sys-admin query: offline(N1) 
VM2 
VM2 
VM1 
VM1 
N1 
with...
Discrete restriction is not enough 
not an unpredictable situation, 
an algorithmic issue
Evaluating the reliability of discrete 
placement constraints 
• simulate a 256-server datacenter 
• running 350 HA webapp...
Studied constraints 
spread 
among 
splitAmong 
maxOnline 
singleResource 
Capacity 
replicas on distinct servers for faul...
scenario 
vertical elasticity 
Tiers 1 
Tiers 2 
Tiers 3
scenario 
vertical elasticity 
Tiers 1 
Tiers 2 
Tiers 3
scenariohorizontal elasticity 
Tiers 1 
Tiers 2 
Tiers 3
scenariohorizontal elasticity 
Tiers 1 
Tiers 2 
Tiers 3
scenarioboot storm x 400
scenarioserver failure
Migrations lead to unanticipated 
Scenario Violated 
SLAs 
placements 
Actions 
VM Boot Migrate Node Boot Node Shutdown 
V...
Migrationstend to violate relative 
100 
75 
50 
25 
0 
Vertical 
Elasticity 
placement constraints 
performance loss 
Hor...
Trading unreliable 
discrete constraints … 
spread(VM[1,2]) 
we addressed an 
assignment problem
… for safe 
continuous constraints 
spread(VM[1,2]) 
we must address 
a scheduling problem
Continuous placement constraints 
with
from discrete to continuous 
among|simpleAmong 
N1 
N2 N3 
N4 N5 
N6 
stay on a same partition by the end of 
the reconfig...
from discrete to continuous 
among|simpleAmong 
N1 
N2 N3 
N4 N5 
N6 
stay on a same partition by the end of 
the reconfig...
from discrete to continuous 
among|simpleAmong 
N1 
N2 N3 
N4 N5 
N6 
stay on a same partition by the end of 
the reconfig...
from discrete to continuous 
among|simpleAmong 
N1 
N2 N3 
N4 N5 
Disallow movements between partitions 
• basic knowledge...
from discrete to continuous 
among|simpleAmong 
N1 
N2 N3 
N4 N5 
Disallow movements between partitions 
• basic knowledge...
continuous spread 
discrete spread(VM[1,2]) ::= continuous spread(VM[1,2]) ::= 
allDi↵erent(dhost 
1 , dhost 
2 ) 
allDi↵e...
continuous maxOnline 
discrete maxOnline(N[1..10], 7)::= 
continuous maxOnline(N[1..10], 7)::= 
X10 
i=1 
nqi 
 7 
8i 2 [...
Performance overhead 
5 10 20 50 100 
Duration (sec.) 
Solved instances 
discrete 
continuous 
0 20 40 60 80 100 
10 20 50...
Performance overhead 
5 10 20 50 100 
Duration (sec.) 
Solved instances 
discrete 
continuous 
0 20 40 60 80 100 
10 20 50...
Conclusions 
• discrete restriction is not enough 
• continuous restriction is a solution 
• a different view on the probl...
Future Work 
• a broader range of constraints and objectives 
• reducing performance overhead 
• static analysis to detect...
http://btrp.inria.fr 
open source, 20+ placement constraints, 
demo, tutorials, everything for reproducibility
Prochain SlideShare
Chargement dans…5
×

Higher SLA Satisfaction in Datacenters with Continuous VM Placement Constraints

502 vues

Publié le

Huynh Tu Dang, and Fabien Hermenier.
9th Workshop on Hot topics in Dependable Systems (HotDep 2013). Farmington, USA. ACM.

Full paper http://dl.acm.org/citation.cfm?id=2524226

Publié dans : Logiciels
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Higher SLA Satisfaction in Datacenters with Continuous VM Placement Constraints

  1. 1. Higher SLA Satisfaction in Datacenters with Continuous Placement Constraints Huynh Tu Dang hdang@polytech.unice.fr Fabien Hermenier fabien.hermenier@unice.fr
  2. 2. SLA for a virtualised application
  3. 3. SLA for a virtualised application spread the replicas
  4. 4. SLA for a virtualised application performance guarantee
  5. 5. SLA for a virtualised application low latency
  6. 6. reconfiguration algorithm N2 N3 time SLA: spread(VM1, VM2) VM2 VM1 N1
  7. 7. reconfiguration algorithm N2 N3 time SLA: spread(VM1, VM2) sys-admin query: offline(N1) VM2 VM2 VM1 VM1 N1 Reconfiguration process
  8. 8. reconfiguration algorithm N2 N3 time SLA: spread(VM1, VM2) sys-admin query: offline(N1) VM2 VM2 VM1 VM1 N1 with discrete restrictions
  9. 9. Discrete restriction is not enough not an unpredictable situation, an algorithmic issue
  10. 10. Evaluating the reliability of discrete placement constraints • simulate a 256-server datacenter • running 350 HA webapp (5,200 VMs) • BtrPlace as the reconfiguration algorithm • 4 reconfiguration scenarios that mimic industrial use case • 100 instances per scenario
  11. 11. Studied constraints spread among splitAmong maxOnline singleResource Capacity replicas on distinct servers for fault tolerance DBs on a same edge-switch for a fast synchronisation. webapp split over 2 clusters for disaster recovery 240 nodes online at maximum to fit licensing policy keep resource for hypervisor management operations
  12. 12. scenario vertical elasticity Tiers 1 Tiers 2 Tiers 3
  13. 13. scenario vertical elasticity Tiers 1 Tiers 2 Tiers 3
  14. 14. scenariohorizontal elasticity Tiers 1 Tiers 2 Tiers 3
  15. 15. scenariohorizontal elasticity Tiers 1 Tiers 2 Tiers 3
  16. 16. scenarioboot storm x 400
  17. 17. scenarioserver failure
  18. 18. Migrations lead to unanticipated Scenario Violated SLAs placements Actions VM Boot Migrate Node Boot Node Shutdown Vertical Elasticity 40.72 0% 99.99% 0.005% 0.005% Horizontal Elasticity 0.19 99.82% 0.18% 2.82% 0% Server Failure 29.56 61.29% 35.89% 2.82% 0% Boot Storm 0.35 98.57% 1.43% 0% 0%
  19. 19. Migrationstend to violate relative 100 75 50 25 0 Vertical Elasticity placement constraints performance loss Horizontal Elasticity Server Failure Boot Storm Violations spread among splitAmong maxOnline spof failure
  20. 20. Trading unreliable discrete constraints … spread(VM[1,2]) we addressed an assignment problem
  21. 21. … for safe continuous constraints spread(VM[1,2]) we must address a scheduling problem
  22. 22. Continuous placement constraints with
  23. 23. from discrete to continuous among|simpleAmong N1 N2 N3 N4 N5 N6 stay on a same partition by the end of the reconfiguration process
  24. 24. from discrete to continuous among|simpleAmong N1 N2 N3 N4 N5 N6 stay on a same partition by the end of the reconfiguration process
  25. 25. from discrete to continuous among|simpleAmong N1 N2 N3 N4 N5 N6 stay on a same partition by the end of the reconfiguration process
  26. 26. from discrete to continuous among|simpleAmong N1 N2 N3 N4 N5 Disallow movements between partitions • basic knowledge of a reconfiguration process • still an assignment problem N6
  27. 27. from discrete to continuous among|simpleAmong N1 N2 N3 N4 N5 Disallow movements between partitions • basic knowledge of a reconfiguration process • still an assignment problem N6
  28. 28. continuous spread discrete spread(VM[1,2]) ::= continuous spread(VM[1,2]) ::= allDi↵erent(dhost 1 , dhost 2 ) allDi↵erent(dhost 1 , dhost 2 ) ^ dhost 1 = chost 2 =) astart 1 # aend 2 ^ dhost 2 = chost 1 =) astart 2 # aend 1 Disallow temporary overlapping • require to know this may happen • scheduling 101
  29. 29. continuous maxOnline discrete maxOnline(N[1..10], 7)::= continuous maxOnline(N[1..10], 7)::= X10 i=1 nqi  7 8i 2 [1, 10], non i = ⇢ 0 if nq i = 1 astart i otherwise noff i = ⇢ max(T) if nq i = 0 aend i otherwise i # t ^ noff 8t 2 T, card({i|non i })  7 scheduling 201 detailed knowledge of a reconfiguration process harder to imagine, model & implement
  30. 30. Performance overhead 5 10 20 50 100 Duration (sec.) Solved instances discrete continuous 0 20 40 60 80 100 10 20 50 100 Duration (sec.) Solved instances discrete continuous 0 20 40 60 80 100 0 20 40 60 80 100vertical elasticity discrete continuous 5 10 20 50 100 200 Duration (sec.) 5 10 20 50 100 Duration (sec.) Solved instances Solved instances discrete continuous 0 20 40 60 80 100 boot storm horitzontal elasticity server failure
  31. 31. Performance overhead 5 10 20 50 100 Duration (sec.) Solved instances discrete continuous 0 20 40 60 80 100 10 20 50 100 Duration (sec.) Solved instances discrete continuous 0 20 40 60 80 100 0 20 40 60 80 100vertical elasticity discrete continuous 5 10 20 50 100 200 Duration (sec.) 5 10 20 50 100 Duration (sec.) Solved instances Solved instances discrete continuous 0 20 40 60 80 100 boot storm horitzontal elasticity server failure
  32. 32. Conclusions • discrete restriction is not enough • continuous restriction is a solution • a different view on the problem • challenging, but still possible to implement
  33. 33. Future Work • a broader range of constraints and objectives • reducing performance overhead • static analysis to detect un-necessary continuous constraints • controlled relaxation to handle hard situations
  34. 34. http://btrp.inria.fr open source, 20+ placement constraints, demo, tutorials, everything for reproducibility

×