Paper submitted to the WACC2017
Authors:Eugenio Gianniti, Danilo Ardagna, Michele Ciavotta (Politecnico di Milano, Italy) and Mauro Passacantando (Università di Pisa, Italy)
A game theoretic approach for runtime capacity allocation in map-reduce (WACC2017)
1. A Game-Theoretic Approach for Runtime
Capacity Allocation in MapReduce
Eugenio Gianniti *, Danilo Ardagna *, Michele Ciavotta *,
Mauro Passacantando * *
*Politecnico di Milano
**Università di Pisa
POLITECNICO DI MILANOSlide 1
2. Goals
POLITECNICO DI MILANO
Cost-effective deployment configuration for MapReduce systems hosted on
private Clouds
Efficient solution algorithm to support run-time cluster management
Distributed approach leveraging on local knowledge of problem parameters
Slide 2
4. Preliminary Formulation
POLITECNICO DI MILANO
min
r,h,sM ,sR
NX
i=1
¯⇢ri +
NX
i=1
Pi (hi)
NX
i=1
ri R
Hlow
i hi Hup
i , 8i 2 A
Aihi
sM
i
+
Bihi
sR
i
+ Ei 0, 8i 2 A
sM
i
cM
i
+
sR
i
cR
i
ri, 8i 2 A
ri 2 N0, 8i 2 A
sM
i 2 N0, 8i 2 A
sR
i 2 N0, 8i 2 A
hi 2 N0, 8i 2 A
subject to:
Integer Nonlinear Problem
Non-convex constraints
DECISION VARIABLES
hi — concurrency level
ri — virtual machines
si
M — Map slots
si
R — Reduce slots
Continuous relaxation
hi à ( 𝜓i)-1
Slide 4
5. Centralized Problem
POLITECNICO DI MILANO
Continuous Nonlinear Problem
Need to centralize information
Convex constraints
KKT conditions are necessary and
sufficient for optimality
DECISION VARIABLES
𝜓i — reciprocal concurrency level
ri — virtual machines
si
M — Map slots
si
R — Reduce slots
Slide 5
min
r, ,sM ,sR
NX
i=1
¯⇢ri +
NX
i=1
(↵i i i)
subject to:
NX
i=1
ri R
low
i i up
i , 8i 2 A
Ai
sM
i i
+
Bi
sR
i i
+ Ei 0, 8i 2 A
sM
i
cM
i
+
sR
i
cR
i
ri, 8i 2 A
ri 2 R+, 8i 2 A
sM
i 2 R+, 8i 2 A
sR
i 2 R+, 8i 2 A
i 2 R+, 8i 2 A
6. Optimal Configuration Formulae
POLITECNICO DI MILANO
In the optimal configuration it holds:
sM
i =
cM
i
1 +
r
Bi
Ai
cM
i
cR
i
ri, 8i 2 A
sR
i =
cR
i
1 +
r
Ai
Bi
cR
i
cM
i
ri, 8i 2 A
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
r 1
i , 8i 2 A
PROPOSITION
rlow
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
up
i
, 8i 2 A
rup
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
low
i
, 8i 2 A
Exact bounds on resources
requirement
Full knowledge of the
problem parameters
Slide 6
7. ¯⇢ ⇢a
i ⇢up
i
low
i i up
i
Ai
sM
i i
+
Bi
sR
i i
+ Ei 0
sM
i
cM
i
+
sR
i
cR
i
ri
sM
i 2 R+
sR
i 2 R+
i 2 R+
⇢a
i 2 R+
min
i,⇢a
i ,sM
i ,sR
i
↵i i i
subject to:
Application Masters Problems
POLITECNICO DI MILANO
One instance per AM
Continuous Nonlinear Problem
Only application-specific
information
DECISION VARIABLES
𝜓i — reciprocal concurrency level
𝜌i
a — bid for VMs
si
M — Map slots
si
R — Reduce slots
Slide 7
8. max
r,y,⇢
NX
i=1
(⇢ ¯⇢) ˜ri
NX
i=1
pi (rup
i ri)
subject to:
NX
i=1
ri R
ri rlow
i , 8i 2 A
ri rup
i rlow
i yi + rlow
i , 8i 2 A
¯⇢ ⇢ ˜⇢
⇢ ⇢a
i M (1 yi) , 8i 2 A
⇢a
i ⇢ Myi, 8i 2 A
ri 2 R+, 8i 2 A
yi 2 {0, 1}, 8i 2 A
⇢ 2 R+
Resource Manager Problem
POLITECNICO DI MILANO
DECISION VARIABLES
𝜌 — price of VMs
ri — virtual machines
yi — AM i offers more than price
One instance for the whole cluster
Mixed Integer Nonlinear Problem
Takes care of resource
management only
Two alternatives:
and
˜ri = ri
˜ri = ri rlow
i
Slide 8
9. Generalized Nash Equilibrium Problems
POLITECNICO DI MILANO
Not jointly convex
Lack of theoretical guarantees
NEP
JC-NEP
GNEP
Slide 9
A set of players, N, each with utility function Θi
Solution concept:
¯x 2 X is equilibrium , 8i 2 N, 8xi 2 Xi (¯x i) , ⇥i (¯x) ⇥i (xi, ¯x i)
Feasible set of player i: Xi = Xi (x i)
10. Optimal Configuration Formulae
POLITECNICO DI MILANO
sM
i =
cM
i
1 +
r
Bi
Ai
cM
i
cR
i
ri
sR
i =
cR
i
1 +
r
Ai
Bi
cR
i
cM
i
ri
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
r 1
i
The optimal configuration for every AM, given an amount of resources ri, is given
by the following relations:
PROPOSITION
Each AM problem reduces to a
quick algebraic update
Local knowledge of application-
specific parameters
Slide 10
11. Best Reply Algorithm
POLITECNICO DI MILANO
Each iteration is performed in
parallel by the RM and AMs
Continuous equilibrium
configuration
1: ri rlow
i , 8i 2 A
2: sM
i sM, low
i , 8i 2 A
3: sR
i sR, low
i , 8i 2 A
4: i
up
i , 8i 2 A
5: ⇢a
i ¯⇢, 8i 2 A
6: repeat
7: rold
i ri, 8i 2 A
8: solve RM problem
9: for all i 2 A do
10: solve AM i problem
11: if i > low
i then
12: ⇢a
i max {⇢a
i , ⇢} + ⇢up
i
13: end if
14: end for
15: "
PN
i=1
|ri rold
i |
rold
i
16: until " < ¯"
Slide 11
12. Experimental Overview
POLITECNICO DI MILANO
PRELIMINARY ANALYSIS
Ensure the model behavior is consistent with intuition when applied to
simple problems
SCALABILITY ANALYSIS
Verify the feasibility of solving problem instances at a realistic scale in
production environments
STOPPING CRITERION TOLERANCE ANALYSIS
Check the sensitivity of the distributed algorithm with respect to the
tolerance on the relative increment
VALIDATION WITH YARN SLS
Compare model solutions and timings measured on the official simulator
Slide 12
18. Conclusions and Future Work
POLITECNICO DI MILANO
The distributed approach yields accurate approximations of optimal
solutions, even without theoretical guarantees
Extend the formulation to Apache Tez and Spark (based on DAGs)
Couple the proposed algorithm with a local search method based on Colored
Petri Nets simulations
Slide 18
19. Thanks for your attention…
POLITECNICO DI MILANO
…any questions?
Slide 19
20. Solution Rounding Algorithm
POLITECNICO DI MILANO
1: sort A according to increasing ↵i
2: ri dˆrie , 8i 2 A
3: for all j 2 A do
4: if
PN
i=1 ri > R then
5: rj rj 1
6: end if
7: end for
8: sM
i
⌃
ˆsM
i
⌥
, 8i 2 A
9: sR
i
⌃
ˆsR
i
⌥
, 8i 2 A
10: for all j 2 A do
11: while sM
j /cM
j + sR
j /cR
j > rj do
12: sR
j sR
j 1
13: if sM
j /cM
j + sR
j /cR
j > rj then
14: sM
j sM
j 1
15: end if
16: end while
17: end for
The RM runs an O(N) loop
Each AM runs an O(1) loop,
concurrently
Sorting is O(N log N), but can be
done once and cached
Slide 20