SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Autonomous control in Big
Data platforms: an
experience with Cassandra
Emiliano Casalicchio (emc@bth.se)
Joint research with:
Lars Lundberg and Sogand Shirinbab
Computer Science Dep.
Blekinge Institute of Technology
ACROSS	
  -­‐ Rome	
  Meeting
Research framework
• Scalable resource-efficient systems for big data analytics
• awarded by Knowledge Foundation, Sweden (20140032),
ACROSS	
  -­‐ Rome	
  Meeting
Agenda
• Big Data Platform
• Main properties
• Why autonomous control is important
• Challenges
• The Cassandra case study
• Conclusions
ACROSS	
  -­‐ Rome	
  Meeting
The NIST BDRABig	
  Data	
  Framework	
  Providers
BD	
  Platforms	
  (logical	
  data	
  organization	
  and	
  
distribution,	
  access	
  API)
Infrastructures	
  (networking,	
  computing,	
  
storage)
BD	
  Processing	
  Frameworks	
  (batch,	
  
interactive,	
  streaming)
Big	
  Data	
  Applications
e.g.	
  HDFS,	
  Cassandra,	
  Hbase,	
  Dynamo,	
  PNUTS,	
  …	
  
e.g.	
  MapReduce,	
  Flink,	
  Mahart,	
  Storm,	
  pbdR,	
  Tez,	
  Spark,	
  
Esper,	
  WSO2-­‐CEP
• File	
  systems
• Google	
  File	
  System	
  
• Apache	
  Hadoop	
  File	
  
Systems	
  (HDFS)
• NoSQL	
  data	
  store
• Hbase,	
  BigTable
• Cassandra
• Dynamo,	
  DynamoDB
• Sherpa,	
  PNUTS
ACROSS	
  -­‐ Rome	
  Meeting
Properties of NoSQL data stores
• Scalability
• Throughput / Dataset size
• Availability
• Data replication
• Eventual consistency
• Consistency level for R/W, to trade off availability and latency
ACROSS	
  -­‐ Rome	
  Meeting
Autonomic control is a must
• System complexity
• Human assisted management is unrealistic
• Security
• complete automation of procedures
• self-configuration, self-healing and self-protection
• Optimization
• Self-optimization
ACROSS	
  -­‐ Rome	
  Meeting
Two approaches
ACROSS	
  -­‐ Rome	
  Meeting
Multi-layer
adaptation
(e.g. orchestration
of DB nodes auto-
scaling and VM
placement on top
of the physical
infrastructure)
Single-layer adaptation
(e.g. auto-scaling of DB nodes, self
configuration of DB parameters)
Platforms	
  (logical	
  data	
  organization	
  and	
  
distribution,	
  access	
  API)
Infrastructures
Big	
  Data	
  Applications	
  +	
  processing	
  
Framework
e.g.	
  HDFS,	
  Cassandra,	
  Hbase,	
  Dynamo,	
  PNUTS,	
  …	
  
Virtual	
  infrastructure
Physical	
  infrastructure
Issues in single layer adaptation
• Interference between infrastructure adaptation and
platform adaptation
• Platform properties can limit infrastructure level adaptation
actions
• E.g. effect of auto-scaling can be limited by serialization constraints.
• Geographical distribution (and network configuration) can
conflict with latency/availability trade off at platform layer
• Infrastructure adaptation can hurt NoSQL data store
properties
• E.g. 2 or more replicas on the same PM impact node reliabilitt and
consistency level reliability
ACROSS	
  -­‐ Rome	
  Meeting
An example
DB DB DB
VM VM VM
PM PM PM
reliability=1-(1-r)3
If r=0.9 reliability is 0.999
DB DB DB
VM VM VM
PM
reliability=0.9
RF=3 each node store a replica of the data set
DB DB DB
VM VM VM
PM
reliability=0.99
PM
ACROSS	
  -­‐ Rome	
  Meeting
Multi layer adaptation
• It means to coordinate at run time
• Self configuration of BD platform
• Deployment of the platform on the virtual infrastructure
• Allocation/placement of virtual infrastructure on physical infrastructure
• The challenges are:
• To formulate an optimization model that account for all the
dependencies and constraints imposed by the system architecture
• To formulate multi objective functions that account for contrasting
objectives at infrastructure level and application level
• E.g. minimizing power consumption and maximizing platform reliability
ACROSS	
  -­‐ Rome	
  Meeting
The Cassandra case study
• E.Casalicchio, L.Lundberg, S.Shirinbab, Energy-aware adaptation in managed Cassandra
datacenters, IEEE International Conference on Cloud and Autonomic Computing (ICCAC
2016), Augsburg, Germany, September 12-16, 2016
• E. Casalicchio, L.Lundberg, S.Shirinbab (2017), Energy-aware Auto-scaling Algorithms for
Cassandra Virtual Data Centers, Cluster Computing, Elsevier (TO APPEAR JUNE 2017).
ACROSS	
  -­‐ Rome	
  Meeting
Motivations
• Data storage (or serving systems) are playing an important role in
the cloud and big data industry
• The management is a challenging task
• The complexity increase when multitenancy is considered
• Human assisted control is unrealistic
• there is a growing demand for autonomic solutions
• Our industrial partner Ericsson AB is interested in autonomic
management of cassandra
ACROSS	
  -­‐ Rome	
  Meeting
Problem description
• We consider a provider of a
managed Apache Cassandra
service
• The applications or tenants of the
service are independent and each
uses its own Cassandra Virtual
Data Center (VDC)
• The service provider want to maintain SLAs, that requires
• to properly plan the capacity and the configuration of each Cassandra VDC
• To dynamically adapt the infrastructure and VDC configuration without
disrupting performances
• To minimize power consumption.
ACROSS	
  -­‐ Rome	
  Meeting
Solution proposed (1000 ft view)
• An energy-aware adaptation model specifically designed for Cassandra VDC
running on a cloud infrastructure
• Architectural constraints imposed by Cassandra (minimum number of nodes,
homogeneity of nodes, replication factor and heap size)
• Constraints on throughput and replication factor imposed by the SLA
• Power consumption model based on CPU utilization
• An adaptation policy of the Cassandra VDC configuration and the cloud
infrastructure configuration, that orchestrate three strategies:
• Vertical scaling
• Horizontal scaling
• Optimal placement
ACROSS	
  -­‐ Rome	
  Meeting
Solution proposed (deep details)
• A Workload and SLA model
• A System architecture model
• A throughput model
• The utility function and problem formulation
• Drawbacks of the optimal solution and alternatives
• Experimental results
ACROSS	
  -­‐ Rome	
  Meeting
Workload and SLA model
• Workload features
• the type of requests, e.g. read only, write only, read & write, scan, or a
combination of those
• The rate of the operation requests
• The size of the dataset
• Workload types
• CPU bound, Memory bound
• The data replication_factor
• SLA
R, W, RW (75/25) Dataset size
Replication factor
< 𝑙#
, 𝑇#
&#'
, 𝐷#, 𝑟# >
Min Thr ACROSS	
  -­‐ Rome	
  Meeting
Architecture model
• Homogeneous physical machines (H)
• VMs of different type and size (V)
• A VDC is composed of 𝑛# homogeneous VMs
• 𝑛# ≥ 𝐷#
• At least 𝐷# vnodes out of 𝑛# must run on different PMs
• Datacenter configuration is described by a vector
• 𝑥 = 𝑥#,/,0
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and config. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ⇥103
8.3 ⇥103
13.3 ⇥103
2 4 16 4 8.3 ⇥103
8.3 ⇥103
8.3 ⇥103
3 2 16 4 3.3 ⇥103
3.3 ⇥103
3.3 ⇥103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
In case ri > heapSize
constraint ni Di. Con
can be defined as
ni =
j
and considering that in
heapSizej for all co
constraints are modelle
X
j2J ,h2H
xi,j,h
X
j2J
yi,j = 1
X
si,h Di
j cj mj heapSizej R W RW
1 8 32 8 16.6 ⇥103
8.3 ⇥103
13.3 ⇥103
2 4 16 4 8.3 ⇥103
8.3 ⇥103
8.3 ⇥103
3 2 16 4 3.3 ⇥103
3.3 ⇥103
3.3 ⇥103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
to process the requests from application i. The SLA parameters
Di and ri are used to determine the number of vnodes to be
instantiated, as discussed in the next section.
Concerning Assumption 1, we limit the study to the set L =
{R, W, RW} because the model we propose can deal with any
type of operation requests, as clarified later in Section IV-C
Assumption 4 implies that the service provider have to set up
ACROSS	
  -­‐ Rome	
  Meeting
Architecture model (cont’d)
• To make CPU bound a VDC we need 𝑛# ≥ 𝐷#
12
03456#738
, if 𝑟# > ℎ𝑒𝑎𝑝𝑆𝑖𝑧𝑒/
• 𝑦#,/ = 1 if application i use a VM configuration j to run Cassandra vnodes,
otherwise 𝑦#,/ = 0
• 𝑠#,0 = 1	
  if a Cassandra vnode serving application i run of PM h. Otherwise 𝑠#,/ =
0
• Vertical scaling is modelled assuming that is possible to switch from
configuration j1 to j2 at runtime
ri > heapSizej it holds Eq. 1, otherwise, hold the
nt ni Di. Considering that the number ni of vnodes
efined as
ni =
X
j2J ,h2H
xi,j,h 8i 2 I. (2)
sidering that in our industrial case is always ri
ej for all configurations j, the above introduced
nts are modelled by the following equations:
X
2J ,h2H
xi,j,h Di ·
ri
heapSizej
8i 2 I (3)
X
2J
yi,j = 1 8i 2 I (4)
X
2H
si,h Di 8i 2 I (5)
Throughput for different workloads (ops/sec)
R W RW
16.6 ⇥103
8.3 ⇥103
13.3 ⇥103
8.3 ⇥103
8.3 ⇥103
8.3 ⇥103
3.3 ⇥103
3.3 ⇥103
3.3 ⇥103
BLE FOR THE DATASET IN A CASSANDRA
NCTION OF THE VM MEMORY SIZE.
1 2 4 8 16 32
GB) 0.5 1 1 2 4 8
e service provider must guarantee
application i. The SLA parameters
mine the number of vnodes to be
the next section.
1, we limit the study to the set L =
odel we propose can deal with any
as clarified later in Section IV-C.
he service provider have to set up,
oarding phase, and to maintain, at
f vnodes for tenant i. Dealing only
ni =
X
j2J ,h2H
xi,j,h 8i 2 I. (2)
and considering that in our industrial case is always ri
heapSizej for all configurations j, the above introduced
constraints are modelled by the following equations:
X
j2J ,h2H
xi,j,h Di ·
ri
heapSizej
8i 2 I (3)
X
j2J
yi,j = 1 8i 2 I (4)
X
h2H
si,h Di 8i 2 I (5)
where: yi,j is equal to 1 if application i use a VM configuration
j to run Cassandra vnodes, otherwise yi,j = 0; si,h is equal
to 1 if a Cassandra vnode serving application i run of PM h.
Otherwise si,h = 0.
Finally, to model vertical scaling actions, that is a change
from configuration j1 to j2, we replace the VM of type j1
with a VM of type j2. However, in a real setting, hypervisorsACROSS	
  -­‐ Rome	
  Meeting
Throughput model
• The actual throughput 𝑇# is a function of 𝑥#,/,0
0 2 4 6 8 10 12
Number of nodes (ni
)
0
20
40
60
80
100
Throughputti,j,h
(103
ops/sec)
R
W
RW
t0
t(5< ni
≤ 8) = t0
· δk
l
i
,j
· (ni
-4) + t(ni
=4)
n
i
Fig. 2. A real example of Cassandra throughput as function of the number
of Cassandra vnodes allocated for different type of requests. The plot shows
how the model we propose is realistic.
min f(x) = P(x)
subject to:
X
J ,H
t(xi,j,h) Tmin
i , 8i 2 I
X
H
xi,j,h · yi,j
Di · ri
heapSizej
, 8i 2 I,
xi,j,h  · yi,j, 8i 2 I, j 2 J , h 2 H
X
J
yi,j = 1, 8i 2 I
X
I,J
xi,j,h · cj  Ch, 8 h 2 H
X
i
ple, in Figure 2. k
li,j is the slope of the kth
segment and
id for a number of Cassandra vnodes ni between nk 1
nk. Therefore, for nk 1  ni  nk, we can write the
wing expression:
t(ni) = t(nk 1) + t0
li,j · k
li,j · (ni nk 1) (6)
e k 1, n0 = 1 and ti,j,h(1) = t0
li,j.
nally, for a configuration x of a VDC, and considering
ion 2 we define the overall throughput Ti as:
Ti(x) = t (ni) , 8i 2 I (7)
ower consumption model
s service provider utility we chose the power consumption
s directly related with the provider revenue (and with IT
nability).
literature has been proposed many work for reducing
tacenter running N independent
vector x = [xi,j,h], where xi,j,h
nodes serving application i and
ation j allocated on PM h, 8i 2
2 H = [1, H] and I, J , H ⇢
s a nominal CPU capacity Ch,
ble cores, and a RAM of Mh
onfigured with cj virtual cores,
mum JVM heap size heapSizej
mportant parameter in our case
of the data a Cassandra vnodes
or fast retrieval and processing.
size of the RAM of the heap
ummarised in Table II. Hence,
C instantiated for application i
li,j
is valid for a number of Cassandra vnodes n
and nk. Therefore, for nk 1  ni  nk, w
following expression:
t(ni) = t(nk 1) + t0
li,j · k
li,j · (ni
where k 1, n0 = 1 and ti,j,h(1) = t0
li,j.
Finally, for a configuration x of a VDC,
Equation 2 we define the overall throughput
Ti(x) = t (ni) , 8i 2 I
D. Power consumption model
As service provider utility we chose the po
that is directly related with the provider reven
sustainability).
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and config. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ⇥103
8.3 ⇥103
13.3 ⇥103
2 4 16 4 8.3 ⇥103
8.3 ⇥103
8.3 ⇥103
3 2 16 4 3.3 ⇥103
3.3 ⇥103
3.3 ⇥103
In case ri > heapSizej it holds Eq.
constraint ni Di. Considering that t
can be defined as
ni =
X
j2J ,h2H
xi,j,h 8
and considering that in our industria
heapSizej for all configurations j,
ACROSS	
  -­‐ Rome	
  Meeting
Energy consumption model
• Based on physical node utilization
• 𝑃0
&4F
= 500𝑊 and 𝑘0 = 0.7
core used (e.g. [7]).
In this work we chose a linear model [3] where the power
Ph consumed by a physical machine h is a function of the
CPU utilization and hence of the system configuration x:
Ph(x) = kh · Pmax
h + (1 kh) · Pmax
h · Uh(x) (8)
where Pmax
h is the maximum power consumed when the PM
h is fully utilised (e.g. 500W), kh is the fraction of power
consumed by the idle PM h (e.g. 70%), and the CPU utilisation
for PM h is defined by
Uh(x) =
1
Ch
·
X
I,J
xi,j,h · cj (9)
I
yi,j, si,h a
xi,j,k 2 N,
where: t
that the SLA
all the tenan
non linear, b
from operat
eq. 6. Eq. 1
the number
portion of da
and that the
implemente
that for eac
is an extrem
Ph(x) = kh · Pmax
h + (1 kh) · Pmax
h · Uh(x) (8)
Pmax
h is the maximum power consumed when the PM
ly utilised (e.g. 500W), kh is the fraction of power
ed by the idle PM h (e.g. 70%), and the CPU utilisation
h is defined by
Uh(x) =
1
Ch
·
X
I,J
xi,j,h · cj (9)
optimisation problem
ntroduced before, the service provider aims to minimise
all energy consumption P(x) defined by
X
where: the set of c
that the SLA is satisfie
all the tenants. For the s
non linear, but they can
from operational resear
eq. 6. Eq. 12 introduce
the number of vnodes a
portion of dataset handl
and that the replicatio
implemented. Equation
that for each tenant mu
is an extremely large po
maximum capacity of t
relaxation of such cons
In the same way, Eq.
for the vnodes do not e
physical nodes. Eq. 17ACROSS	
  -­‐ Rome	
  Meeting
Problem formulation
• Linear model
• Objective function linear
• Constraints linear
• Constraints imposed by
• SLA on throughput and
replication factor
• Replication factor (and number of
distinct PMs)
• Homogeneity of vnodes
configuration
• Heap size (we want a CPU bound
configuration)
0 2 4 6 8 10 12
Number of nodes (ni
)
0
20
40
60
80
100
Throughputti,j,h
(103
ops/sec)
R
W
RW
t
0
t(5< ni
≤ 8) = t
0
· δ
k
l
i
,j
· (ni
-4) + t(ni
=4)
n
i
Fig. 2. A real example of Cassandra throughput as function of the number
of Cassandra vnodes allocated for different type of requests. The plot shows
how the model we propose is realistic.
on cloud management systems the techniques typically used
are: scheduling, placement, migration, and reconfiguration
of virtual machines. In the specific the ultimate goal is to
optimise the use of resources to reduce power consumption.
Optimisation depends on the context, it could means min-
imising PM utilisation or to balance the utilisation level of
physical machine with the use of network devices for data
transfer and storage. Independently from the configuration or
adaptation policy adopted all these techniques are based on
power and/or energy consumption models. Power consumption
models usually define a linear relationship between the amount
of power used by a system as function of the CPU utilisation
(e.g. [3]–[5]), or processor frequency (e.g. [6]) or number of
core used (e.g. [7]).
In this work we chose a linear model [3] where the power
Ph consumed by a physical machine h is a function of the
min f(x) = P(x)
subject to:
X
J ,H
t(xi,j,h) Tmin
i , 8i 2 I (11)
X
H
xi,j,h · yi,j
Di · ri
heapSizej
, 8i 2 I, j 2 J (12)
xi,j,h  · yi,j, 8i 2 I, j 2 J , h 2 H (13)
X
J
yi,j = 1, 8i 2 I (14)
X
I,J
xi,j,h · cj  Ch, 8 h 2 H (15)
X
I,J
xi,j,h · mj  Mh, 8 h 2 H (16)
X
H
si,h Di, 8i 2 I (17)
X
J
xi,j,h si,h ·  0, 8h 2 H (18)
X
J
xi,j,h + si,h  0, 8h 2 H (19)
X
I
si,h rh ·  0, 8h 2 H (20)
X
I
si,h + rh  0, 8h 2 H (21)
yi,j, si,h and rh 2 [0, 1], 8i 2 I, j 2 J , h 2 H (22)
xi,j,k 2 N, 8i 2 I, j 2 J , h 2 H (23)ACROSS	
  -­‐ Rome	
  Meeting
Sub-optimal adaptation
• LocalOpt
• LocalOpt-H
• BestFit
• BestFit-H
10
0 500 1000
Number of tenants
102
10
4
10
6
10
8
Num.ofIterations
V=3
Opt (H=100)
BestFit (H=100)
LocalOpt (H=100)
Opt (H=1000)
BestFit (H=1000)
LocalOpt (H=1000)
0 500 1000
Number of tenants
102
10
4
106
10
8
V=10
Fig. 3 Number of Iterations for di↵erent values of N, V and
H
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 ACROSS	
  -­‐ Rome	
  Meeting
Scenarios
• New Service Subscriptions
• Dataset size increase
• Throughput Increase
• Surge in the throughput
• Physical node failures
ACROSS	
  -­‐ Rome	
  Meeting
asible solutions.
ines 19 - 30 place the vnodes on the PMs minimising
umber of PMs used packing as much vnodes as possible
PM, of course in the respect of Di constraint. That also
mise the energy consumption. The function any(cj⇤ 
compare cj⇤ with all the element of Ca
and it returns
f exist at least one element of Ca
that is greater or equal
cj⇤ . Otherwise, if no PMs satisfy the constraint it returns
The same behaviour is valid for any(mj⇤  Ma
). The
ion sortDescendent(Ha) sorts the Ha in descending
. The function popRR(Ha,Di) extracts, in round-robin
, a PM from the first Di in Ha. At Line 28, if there is
ore room in the selected PMs the set Ha is sorted again
allows to try the allocation for the PMs that have now
capacity available). At line 32, if not all the n⇤
i,j⇤ vnodes
allocated the empty set is returned because no feasible
ions for the allocation. Otherwise, the suboptimal solution
is returned.
VI. PERFORMANCE EVALUATION METHODOLOGY
TABLE III. MODEL PARAMETERS USED IN THE EXPERIMENTS
Parameter Value Description
N 1 – 10 Number of tenants
V 3 Number of VM types
H 8 Number of PMs
Di 1 – 4 Replication factor for App. i
ri 5 - 50 Dataset size for App. i
L {R, W, RW } Set of request types
T min
i 10000 70000 ops/sec Minimum throughput agreed in the
SLA
Ch 16 Number of cores for PM h
cj 2 – 8 Number of vcores used by VM
type j
Mh 128 GB Memory size of PM h
mj 16 – 32 GB Total memory used by VM type j
heapSizej 4 – 8 GB Max heap size used by VM type j
8li: 1
li
1 1  xi,j,h  2
2
li
0.8 3  xi,j,h  7
3
li
0.66 xi,j,h 8
P max
h 500 Watt Maximum power consumed by PM
h if fully loaded
kh 0.7 Fraction of P max
h consumed by
PM h if idle
Performance metrics
• Total power consumption
• Scaling Index
• Count for horizontal and vertical scaling
• The Migration Index
• Count the number of migrations
• Delayed Requests
• Assume not request timeout
• Consistency level reliability
• Defined as the probability that the number of healthy replicas in the Cassandra VDC is
enough to guarantee a specific level of consistency over a fixed time interval
ACROSS	
  -­‐ Rome	
  Meeting
New service subscriptions
Workload
• 75%R, 15%W, 10%RW (unif)
• Tmin = 10-18Kops/sec (unif)
• Di = 2,3 (unif)
• Ri = 8GB
et size variation scenarios, while
for the SLA variation scenario.
d on using Matlab R2015b 64-
gle Intel Core i5 processor with
model parameters we used for
ble III.
MENTAL RESULTS
compare the performance of the
lOpt and BestFit heuristics.
we increase the number of sub-
e SLA parameters are randomly
ts summarise the data collected
nant subscription.
system scale when new tenants
mber of vnodes allocated grows
enants. The differences are that:
alOpt tend to allocate small
Number of tenants (N)
Fig. 3. New service subscription: number of vnodes allocated by the three
adaptation policies
1 2 3 4 5 6 7 8
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
1 2 3 4 5 6 7 8
LocalOpt
1 2 3 4 5 6 7 8
BestFit
Number of tenants (N)
Fig. 4. New service subscription: total power consumed by the three
adaptation policies
3
4
5
6
tionIndex(MI)
ing Monte Carlo simulation for the New service
ns and Small dataset size variation scenarios, while
valuation is used for the SLA variation scenario.
s have been carried on using Matlab R2015b 64-
X running on a single Intel Core i5 processor with
ain memory. The model parameters we used for
are reported in Table III.
VII. EXPERIMENTAL RESULTS
vice subscription
owing experiments compare the performance of the
icy with the LocalOpt and BestFit heuristics.
th one tenant and we increase the number of sub-
to 8. Because the SLA parameters are randomly
he following results summarise the data collected
ns for each new tenant subscription.
3 shows how the system scale when new tenants
he service. The number of vnodes allocated grows
h the number of tenants. The differences are that:
l policy and LocalOpt tend to allocate small
hines (VM type 1 and 2) and rarely select VM
the contrary, the BestFit algorithm often uses
(VM type 3) and that explains why the minimum
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Number of tenants (N)
Fig. 3. New service subscription: number of vnodes allocated by the three
adaptation policies
1 2 3 4 5 6 7 8
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
1 2 3 4 5 6 7 8
LocalOpt
1 2 3 4 5 6 7 8
BestFit
Number of tenants (N)
Fig. 4. New service subscription: total power consumed by the three
adaptation policies
1 2 3 4 5 6 7
Number of tenants
0
1
2
3
4
5
6
MigrationIndex(MI)
Results
• Optimal policy and LocalOpt
• allocate small VMs (type 1 and 2)
• rarely select VM type 3
• BestFit algorithm
• uses large VMs (VM type 3)
• the minimum number of vnodes
allocated is lower than the other policies.
ACROSS	
  -­‐ Rome	
  Meeting
Dataset size increase
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and config. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ⇥103
8.3 ⇥103
13.3 ⇥103
2 4 16 4 8.3 ⇥103
8.3 ⇥103
8.3 ⇥103
3 2 16 4 3.3 ⇥103
3.3 ⇥103
3.3 ⇥103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
to process the requests from application i. The SLA parameters
Di and ri are used to determine the number of vnodes to be
instantiated, as discussed in the next section.
Concerning Assumption 1, we limit the study to the set L =
{R, W, RW} because the model we propose can deal with any
I
c
c
a
h
c
w
j
t
O
Workload
• 3 tenants ( R, W, RW )
• ri = 10 – 50 GB
• Tmin = [14,10,18] Kops/sec
• Di = [3,2,3]
plot for the power consumed P(x)
ht.
40 50
± %)
D2
=2
0 20 30 40 50
r
i
(8GB ± %)
App. 3, D3
=3
ox plot for the number of vnodes
0, Tmin
2 = 14.000 and Tmin
3 =
ect the performance of the
from a dataset size of 10GB
Scali
10 15 20 25 30 35 40 45 50
0
10
20
10 15 20 25 30 35 40 45 50
ri
(GByte)
10 15 20 25 30 35 40 45 50
Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the
line represents the number of virtual nodes allocated during each experiment
10 15 20 25 30 35 40 45 50
r
i
(GByte)
LocalOpt
10 15 20 25 30 35 40 45 50
BestFit
10 15 20 25 30 35 40 45 50
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
Fig. 12. Dataset size increase: box plot for the power consumed P(x).
4
0
10
0
10
20
30
Scalingindexandnum.ofvnod
10 15 20 25 30 35 40 45 50
0
10
20
30
10 15 20 25 30 35 40 45 50
r
i
(GByte)
10 15 20 25 30 35 40 45 50
VM type 1 VM type 2 VM type 3 vnodes
LocalOpt
BestFit
Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the
line represents the number of virtual nodes allocated during each experiment
10 15 20 25 30 35 40 45 50
r
i
(GByte)
LocalOpt
10 15 20 25 30 35 40 45 50
BestFit
10 15 20 25 30 35 40 45 50
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
Fig. 12. Dataset size increase: box plot for the power consumed P(x).
15 20 25 30 35 40 45 50
ri
(GByte)
0
1
2
3
4
MigrationIndexMI
ACROSS	
  -­‐ Rome	
  Meeting
Throughput increase
0 10 20 30 40 50 60 70
T
min
i
× 10
3
ops/sec
10 20 30 40 50 60 70
ndex for the three applications and for the three policies. The line represent the number of VMs
or the write
ies have the
not capable
ervice level
orkload RW
nodes used
selects VM
back to VM
ise, locally,
adopts VM
o VM type
the price of
of vnodes migrations.
10 20 30 40 50 60 70
LocalOpt
10 20 30 40 50 60 70
500
1000
1500
2000
2500
3000
3500
4000
PowerconsumptionP(x)(Watt)
Optimal policy
10 20 30 40 50 60 70
BestFit
Tmin
i
(× 103
ops/sec.)
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and config. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ⇥103
8.3 ⇥103
13.3 ⇥103
2 4 16 4 8.3 ⇥103
8.3 ⇥103
8.3 ⇥103
3 2 16 4 3.3 ⇥103
3.3 ⇥103
3.3 ⇥103
TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA
VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE.
mj (RAM size in GB) 1 2 4 8 16 32
heapSizej (max Heap size in GB) 0.5 1 1 2 4 8
the minimum throughput the service provider must guarantee
to process the requests from application i. The SLA parameters
Di and ri are used to determine the number of vnodes to be
instantiated, as discussed in the next section.
Concerning Assumption 1, we limit the study to the set L =
I
c
c
a
h
c
w
j
Workload
• 3 tenants ( R, W, RW )
• Tmin = 10-70 Kops/sec
• Di = 3
• Ri = 8GB
ACROSS	
  -­‐ Rome	
  Meeting
Throughput increase (cont’d)
-10
0
10
ScalingIndex(SI)andnum.ofvnodes
App. 1 (R) App. 2 (W) App. 3 (RW)
-10
0
10
10 20 30 40 50 60 70
-10
0
10
10 20 30 40 50 60 70
Tmin
i
× 103
ops/sec
10 20 30 40 50 60 70
VM type 1 VM type 2 VM type 3 vnodes
LocalOpt
BestFit
Optimal policy
ut increase: Box plot represent the scaling Index for the three applications and for the three policies. The line represent the number of VMs
ch experiment
higher power consumption. Also for the write of vnodes migrations.
TABLE I. t0
li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE),
heapSizej (GBYTE) AND li . THE THROUGHPUT IS MEASURED IN
OPERATIONS/SECOND (OPS/SEC)
VM type and config. Throughput for different workloads (ops/sec)
j cj mj heapSizej R W RW
1 8 32 8 16.6 ⇥103
8.3 ⇥103
13.3 ⇥103
2 4 16 4 8.3 ⇥103
8.3 ⇥103
8.3 ⇥103
3 2 16 4 3.3 ⇥103
3.3 ⇥103
3.3 ⇥103
In case ri > heapSizej it holds Eq
constraint ni Di. Considering that t
can be defined as
ni =
X
j2J ,h2H
xi,j,h
and considering that in our industri
2 but return immediately to VM type 1. That at the price of
an higher energy consumption.
From the optimal policy behaviour we learned that is
better to allocate always smaller virtual machines. That choice
usually allow to satisfy both the dataset and throughput con-
straints minimising the actual throughput provided, that for
large dataset can be higher than Tmin
i .
The power consumption is plotted in Figure 7. The
LocalOpt outperforms the BestFit, specifically for low
throughput. The penalty payed is between 15 and 25% if
LocalOpt is used and between the 13 and 50% for the
BestFit. The higher loss is observed for low values of the
throughput.
Figure 8 shows the Migration Index. Each box plot is
computed over the data collected for the three tenants. We
can observe that each application experiments between 0 and
3 vnode migrations depending on the infrastructure load state.
The case Tmin
i = 50.000 ops/sec. is an exception because it
corresponds to the vertical scaling action taken for App. 1. The
decrease in the number of vnodes used impacts also the value
T
min
i
(× 10
3
ops/sec.)
Fig. 7. Throughput increase: the power consumed P(x) by the three
adaptation policies.
20 30 40 50 60 70
T
min
i
× 10
3
ops/sec
0
1
2
3
MigrationIndex(MI)
Fig. 8. Throughput increase: Migration Index for the optimal policy.
LocalOpt and BestFit have a MI equal to zero by definition.
C. SLA variation: small dataset size variations
Dataset size could heavily impact the number of vnodes
used in a VDC (see Eq. 1). In this experiment we assess if
ACROSS	
  -­‐ Rome	
  Meeting
Throughput increase (cont’d)Auto-scaling Algorithms for Cassandra Virtual Data Centers 13
20 30 40 50 60 70
-10
-5
0
5
10
ScalingIndex(SI)
Opt
20 30 40 50 60 70
-10
-5
0
5
10
LocalOpt
20 30 40 50 60 70
Throughput Ti
min
(× 103
tps)
-10
-5
0
5
10
LocalOpt-H
20 30 40 50 60 70
-10
-5
0
5
10
BestFit
20 30 40 50 60 70
-10
-5
0
5
10
BestFit-H
VM type 1 VM type 2 VM type 3
ghput increase: The box represent the scaling Index actions for the RW workload and for the five policies
he adaptation policy switches between two
ations (vertical scaling). The negative bar
M type dismissed and the positive for the
e allocated. Observations with only pos-
respond to horizontal scaling adaptation
example, for the Optimal policy there is
m VM type 3 (yellow bar) to VM type 2
r the observation Tmin
i = 30.000 ops/sec.
between 0 and 3 vnode migrations depending on the
infrastructure load state.
Considering the low values for the migration index
for the Opt allocation and the high saving in the en-
ergy consumed compared with the other algorithms, it
makes sense to perform periodic VDC consolidation us-
ing the Opt policy, as recommended in Section 7.
is for the VM type dismissed and the positive for the
new VM type allocated. Observations with only pos-
itive bars correspond to horizontal scaling adaptation
actions. For example, for the Optimal policy there is
a change from VM type 3 (yellow bar) to VM type 2
(green bar) for the observation Tmin
i = 30.000 ops/sec.
The number of new allocated VMs is smaller because
each new VM o↵ers a higher throughput. The optimal
adaptation policy always starts allocating VMs of Type
3 (cf. Tab. 1) and, if needed progressively moves to more
powerful VM types. The Opt policy performs only one
vertical scaling and when the VM type if changed from
type 3 to type 2; after that it always does horizon-
tal scaling actions (this is a particularly lucky case).
The two heuristics LocalOpt and BestFit show a very
unstable behaviour performing both vertical and hor-
izontal scaling. Both first scale to VM type 1 from
VM type 3 and then they scale back to VM type 2.
When the variant of the above algorithm is used, that is
LocalOpt-H and BestFit-H respectively, the VM type
is fixed to type 1 and the only action taken is horizontal
scaling.
The power consumption is plotted in Figure 6. For
throughput higher than 40⇥103
ops/sec, with the opti-
mal scaling is possible to save about 50% of the energy
consumed by the heuristic allocation. For low values of
the throughput (10 20⇥103
ops/sec) the BestFit and
BestFit-H show a very high energy consumption com-
Considering the low values for the migration index
for the Opt allocation and the high saving in the en-
ergy consumed compared with the other algorithms, it
makes sense to perform periodic VDC consolidation us-
ing the Opt policy, as recommended in Section 7.
10 20 30 40 50 60 70
Throughput T
i
min
(× 103
tps)
1000
1500
2000
2500
3000
3500
4000
PowerconsumedP(x)(Watt)
Opt
LocalOpt
LocalOpt-H
BestFit
BestFit-H
Fig. 6 Throughput increase: the power consumed P(x) by
the five adaptation policies when increasing the throughput
for Application 3 (RW workload).
9.2 Throughput surge
In this set of experiments we analyse how fast the scal-
ing is, with respect to the throughput variation rate,
and what is the number of delayed requests. We assume
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 ACROSS	
  -­‐ Rome	
  Meeting
Auto-scaling Algorithms for Cassandra Virtual Data Centers 11
of the auto-scaling algorithms
Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H
Capacity planning X
Data center consolidation X
VDC consolidation X X
run-time adaptation X X X X
Surge in the throughput (RW)Emiliano Casalicchio et al.
0 2 4 6 8 10 12 14 16 18 20 22
Time (minutes)
0
10
20
30
40
50
60
70
80
Throughput(×10
3
ops/sec)
ThrRW
min
Opt (Case A, actual thr)
BestFit (Case A, actual thr)
Opt (Case B, actual thr)
BestFit (Case B, actual thr)
Fig. 8 Auto-scaling actions in case of a throughput surge:
Case A and Case B
). Than, at time t = 10, twelve vnodes are al-
Considering the serialization of the horizontal
actions (cf. Section 7) the seven Cassandra vn-
e added in 14 minutes. The LocalOpt behaves
Opt in terms of scaling decisions. The BestFit
aling start allocationg 4vnodes of Type 3, than
up to seven vnodes (at time t = 8) and finally
wo vertical scaling actions: the first from vnodes
to Type 2, and the second from Type 2to Type
s.
number of delayed requests Qi and the percent-
h respect the total number of received requests
) are reported in table 5. Qi and tot.req. are
ed over the time interval the requested through-
n
W exceed the actual throughput.
itively, with Cassandra vnodes capable to han-
gher workload it should be possible to better
the surge in the throughput. Hence, we have an-
Case B where we configure three new types of
dra vnodes capable to handle the following RW
put: type 4, 20 ⇥ 103
ops/sec.; type 5, 15 ⇥ 103
vnodes type capable to handle from low throughput
to very high throughput allow to manage throughput
surges.
Table 5 The number of delayed requests Qi and the per-
centage with respect the total number of received requests
(tot.req.). Qi and tot.req. are computed over the time in-
terval the requested throughput (Tmin
RW ) exceed the actual
throughput.
Case A Qi (⇥103
) Qi
tot.req.
(%)
Opt 191.84 22.78
LocalOpt 191.84 22.78
BestFit 70.89 46.33
Case B
Opt 7.66 4
LocalOpt 7.66 4
BestFit 70.58 30.29
Case	
  B
T4,	
  20	
  × 103ops/sec.	
  
T5,	
  15	
  × 103ops/sec.
T6,	
  7	
  × 103	
  ops/sec.	
  
Case	
  A
T1,	
  13.3x103 ops/sec
T2,	
  8.3x103 ops/sec
T3,	
  3.3x103	
  ops/sec
ACROSS	
  -­‐ Rome	
  Meeting
Physical node failure
Consistency	
  level	
  reliability	
  R	
  defined	
  as	
  the	
  probability	
  that	
  the	
  number	
  of	
  healthy	
  replicas	
  in	
  the	
  
Cassandra	
  VDC	
  is	
  enough	
  to	
  guarantee	
  a	
  specific	
  level	
  of	
  consistency	
  over	
  a	
  fixed	
  time	
  interval	
  
Consistency	
  level	
  ONE	
  and	
  QUORUM	
  (Q=	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  )
rithms are applied.
In case the Cassandra VDC has a number of physi-
cal nodes H equal to the number of vnodes n, and there
is a one-to-one mapping between vnodes and physical
nodes, the consistency level of ONE is guaranteed if one
replica is up. Hence, the Consistence reliability is the
probability that at least one vnode is up and a replica
is on that node:
RO = 1
D
n
⇥ (1 ⇢)n
(24)
where: ⇢ is the resiliency of a physical node, and D
n is the
probability that a replica is on a Cassandra vnode when
the data replication strategy used is the SimpleStrategy
(cf. the Datastax documentation for Cassandra). In the
same way, we can define the reliability of the Cassandra
VDC to guarantee a consistency level of QUORUM as
the probability that at least Q vnodes are up and that
Q replicas are on them:
RQ = 1
D
⇥ (1 ⇢)n Q+1
. (25)
physical nodes is unknown
puted the values of KO and
the value for KO is equa
nodes used, the values for
allocation and on the node
allocation, we could have
where the max{K1
Q, K2
Q, .
and the min{K1
Q, K2
Q, ...} r
example, if 8 vnodes are d
following way {1, 1, 2, 1, 3}
n Q + 1 = 7, KO = 5, K
Table 6 reports the val
and D = 5, ⇢ = 0.9. Th
following way. We consid
for a randomly generated S
RW, 15%W and 75% R; T
in the interval [10.000, 18.0
ri is constant (8GB). The
each tenant is n = 6 for th
the case D = 5. We run 1
the best and worst case ov
In the first set of expe
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
RO = 1
D
n
⇥ (1 ⇢)n
(24)
where: ⇢ is the resiliency of a physical node, and D
n is the
probability that a replica is on a Cassandra vnode when
the data replication strategy used is the SimpleStrategy
(cf. the Datastax documentation for Cassandra). In the
same way, we can define the reliability of the Cassandra
VDC to guarantee a consistency level of QUORUM as
the probability that at least Q vnodes are up and that
Q replicas are on them:
RQ = 1
D
n
⇥ (1 ⇢)n Q+1
. (25)
Table 6 shows the values of RO and RQ for D = 3 and
5 and for ⇢ = 0.9 and ⇢ = 0.8.
In a managed Cassandra data center, a Cassandra
VDC is rarely allocated using a one-to-one mapping of
vnodes on physical nodes. The resource management
policies adopted by the provider usually end-up with
a many-to-one mapping, that is h physical nodes run
n Cassandra vnodes: D  h < n. In that case we can
following way {1, 1, 2, 1, 3}
n Q + 1 = 7, KO = 5, K
Table 6 reports the valu
and D = 5, ⇢ = 0.9. Th
following way. We conside
for a randomly generated S
RW, 15%W and 75% R; T
in the interval [10.000, 18.0
ri is constant (8GB). The
each tenant is n = 6 for th
the case D = 5. We run 10
the best and worst case ov
In the first set of exper
The one-to-one mapping
ONE and QUORUM with
9s and five 9s respectively
tion factor increase to 5 th
9s and eight 9s for consis
respectively.
Unfortunately, when a
the reliability of the consi
orders of magnitude. In th
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
DB DB DB
VM VM VM
PM PM PM
DB DB DB
VM VM VM
PM PM
of the Opt and BestFit auto-scaling al-
LocalOpt behaves as the Opt. From the
ent that with more powerful vnodes the
gorithms are capable to satisfy the re-
hput with a delay of only 2 minutes. The
ating 3 vnodes of type 6, at time t = 4
type 6 is added and at time t = 6 the
a vertical scaling allocating 4 vnodes of
Cassandra o↵ers three main levels of consistency (both
for Read and Write): ONE, QUORUM and ALL. Con-
sistency level of ONE means that only one replica node
is required to reply correctly, that is it contains the
replica of the portion of the dataset needed to answer
the query. Consistency level QUORUM means that Q =⌅D
2
⇧
+ 1 replicas nodes are available to reply correctly
Q replicas are on them:
RQ = 1
D
n
⇥ (1 ⇢)n Q+1
. (25)
Table 6 shows the values of RO and RQ for D = 3 and
5 and for ⇢ = 0.9 and ⇢ = 0.8.
In a managed Cassandra data center, a Cassandra
VDC is rarely allocated using a one-to-one mapping of
vnodes on physical nodes. The resource management
policies adopted by the provider usually end-up with
a many-to-one mapping, that is h physical nodes run
n Cassandra vnodes: D  h < n. In that case we can
generalise equations 24 and 25 to the following:
RO = 1
D
n
⇥ (1 r)KO
(26)
the case D = 5. We
the best and worst ca
In the first set of
The one-to-one map
ONE and QUORUM
9s and five 9s respec
tion factor increase t
9s and eight 9s for
respectively.
Unfortunately, wh
the reliability of the
orders of magnitude.
that all the auto-scali
ONE with a reliabilit
tor increases to 5 the
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
for Cassandra Virtual Data Centers 15
the consistency level of ONE and QUORUM. The probability that a data replica is on
= 5. We assume the reliability of a physical node is ⇢ = 0.9
n-to-one
to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H
99995 0.9995 0.9995
99995 0.995 – 0.9995 0.9995
9999995 0.999995 0.999995
999995 0.9995 – 0.999995 0.99995
9996 0.996 0.996
9984 0.98 – 0.996 0.996
999948 0.99984 0.99984
99987 0.996 – 0.99984 0.9992
). Consistency level
e available.
Consistency level re-
f ONE and of QUO-
RQ = 1
D
n
⇥ (1 r)KQ
. (27)
where: KO is the number of failed physical nodes that
causes a failure of n vnodes; and KQ is the number of
KO is	
  the	
  number	
  of	
  failed	
  physical	
  nodes	
  that	
  causes	
  a	
  
failure	
  of	
  n	
  vnodes;
KQ is	
  the	
  number	
  of	
  failed	
  physical	
  nodes	
  that	
  causes	
  a	
  
failure	
  of	
  	
  (n	
  −	
  Q	
  +	
  1)	
  vnodes.	
  
ACROSS	
  -­‐ Rome	
  Meeting
Physical node failure (cont’d)ware Auto-scaling Algorithms for Cassandra Virtual Data Centers
Consistency reliability R for the consistency level of ONE and QUORUM. The probability that a data repli
s 0.5 for both D = 3 and D = 5. We assume the reliability of a physical node is ⇢ = 0.9
n-to-one
⇢ = 0.9 one-to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H
RO|D=3 0.9999995 0.9995 0.9995
RQ|D=3 0.999995 0.995 – 0.9995 0.9995
RO|D=5 0.99999999995 0.999995 0.999995
RQ|D=5 0.999999995 0.9995 – 0.999995 0.99995
⇢ = 0.8
RO|D=3 0.99996 0.996 0.996
RQ|D=3 0.99984 0.98 – 0.996 0.996
RO|D=5 0.999999948 0.99984 0.99984
RQ|D=5 0.9999987 0.996 – 0.99984 0.9992
D is the replication factor). Consistency level
ans that all the replicas are available.
RQ = 1
D
n
⇥ (1 r)KQ
.ACROSS	
  -­‐ Rome	
  Meeting
Lesson learned
• When you have to deal with a specific technology there are many
constraint to be considered
• The multi-layer adaptation is a must
• Not a single policies fit all the workload
• Not all the policies fit all the application life cycle
aware Auto-scaling Algorithms for Cassandra Virtual Data Centers
3 Use of the auto-scaling algorithms
Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H
Capacity planning X
Data center consolidation X
VDC consolidation X X
run-time adaptation X X X X
ACROSS	
  -­‐ Rome	
  Meeting
Questions ?
Emiliano Casalicchio
http://www.bth.se/people/emc
emc@bth.se
ACROSS	
  -­‐ Rome	
  Meeting

Contenu connexe

Tendances

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
James McGalliard
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Gluster.org
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
DataWorks Summit
 
IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01
Manoj Kaveri
 

Tendances (20)

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
 
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open CloudCoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
MapReduce
MapReduceMapReduce
MapReduce
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
 
Cassandra at teads
Cassandra at teadsCassandra at teads
Cassandra at teads
 
Performance features12102 doag_2014
Performance features12102 doag_2014Performance features12102 doag_2014
Performance features12102 doag_2014
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
Health Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS SystemHealth Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS System
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01IMSBufferpool Tuning concept AMS presentation v01
IMSBufferpool Tuning concept AMS presentation v01
 
Latest in ml
Latest in mlLatest in ml
Latest in ml
 

Similaire à Autonomous control in Big Data platforms: and experience with Cassandra

WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
Deepak Shankar
 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud Computing
Rahul Garg
 

Similaire à Autonomous control in Big Data platforms: and experience with Cassandra (20)

Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
A stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
A stochastic approach to analysis of energy aware dvs-enabled cloud datacentersA stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
A stochastic approach to analysis of energy aware dvs-enabled cloud datacenters
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource Configuration
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green Cloud
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computer
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud Computing
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
 

Dernier

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Dernier (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 

Autonomous control in Big Data platforms: and experience with Cassandra

  • 1. Autonomous control in Big Data platforms: an experience with Cassandra Emiliano Casalicchio (emc@bth.se) Joint research with: Lars Lundberg and Sogand Shirinbab Computer Science Dep. Blekinge Institute of Technology ACROSS  -­‐ Rome  Meeting
  • 2. Research framework • Scalable resource-efficient systems for big data analytics • awarded by Knowledge Foundation, Sweden (20140032), ACROSS  -­‐ Rome  Meeting
  • 3. Agenda • Big Data Platform • Main properties • Why autonomous control is important • Challenges • The Cassandra case study • Conclusions ACROSS  -­‐ Rome  Meeting
  • 4. The NIST BDRABig  Data  Framework  Providers BD  Platforms  (logical  data  organization  and   distribution,  access  API) Infrastructures  (networking,  computing,   storage) BD  Processing  Frameworks  (batch,   interactive,  streaming) Big  Data  Applications e.g.  HDFS,  Cassandra,  Hbase,  Dynamo,  PNUTS,  …   e.g.  MapReduce,  Flink,  Mahart,  Storm,  pbdR,  Tez,  Spark,   Esper,  WSO2-­‐CEP • File  systems • Google  File  System   • Apache  Hadoop  File   Systems  (HDFS) • NoSQL  data  store • Hbase,  BigTable • Cassandra • Dynamo,  DynamoDB • Sherpa,  PNUTS ACROSS  -­‐ Rome  Meeting
  • 5. Properties of NoSQL data stores • Scalability • Throughput / Dataset size • Availability • Data replication • Eventual consistency • Consistency level for R/W, to trade off availability and latency ACROSS  -­‐ Rome  Meeting
  • 6. Autonomic control is a must • System complexity • Human assisted management is unrealistic • Security • complete automation of procedures • self-configuration, self-healing and self-protection • Optimization • Self-optimization ACROSS  -­‐ Rome  Meeting
  • 7. Two approaches ACROSS  -­‐ Rome  Meeting Multi-layer adaptation (e.g. orchestration of DB nodes auto- scaling and VM placement on top of the physical infrastructure) Single-layer adaptation (e.g. auto-scaling of DB nodes, self configuration of DB parameters) Platforms  (logical  data  organization  and   distribution,  access  API) Infrastructures Big  Data  Applications  +  processing   Framework e.g.  HDFS,  Cassandra,  Hbase,  Dynamo,  PNUTS,  …   Virtual  infrastructure Physical  infrastructure
  • 8. Issues in single layer adaptation • Interference between infrastructure adaptation and platform adaptation • Platform properties can limit infrastructure level adaptation actions • E.g. effect of auto-scaling can be limited by serialization constraints. • Geographical distribution (and network configuration) can conflict with latency/availability trade off at platform layer • Infrastructure adaptation can hurt NoSQL data store properties • E.g. 2 or more replicas on the same PM impact node reliabilitt and consistency level reliability ACROSS  -­‐ Rome  Meeting
  • 9. An example DB DB DB VM VM VM PM PM PM reliability=1-(1-r)3 If r=0.9 reliability is 0.999 DB DB DB VM VM VM PM reliability=0.9 RF=3 each node store a replica of the data set DB DB DB VM VM VM PM reliability=0.99 PM ACROSS  -­‐ Rome  Meeting
  • 10. Multi layer adaptation • It means to coordinate at run time • Self configuration of BD platform • Deployment of the platform on the virtual infrastructure • Allocation/placement of virtual infrastructure on physical infrastructure • The challenges are: • To formulate an optimization model that account for all the dependencies and constraints imposed by the system architecture • To formulate multi objective functions that account for contrasting objectives at infrastructure level and application level • E.g. minimizing power consumption and maximizing platform reliability ACROSS  -­‐ Rome  Meeting
  • 11. The Cassandra case study • E.Casalicchio, L.Lundberg, S.Shirinbab, Energy-aware adaptation in managed Cassandra datacenters, IEEE International Conference on Cloud and Autonomic Computing (ICCAC 2016), Augsburg, Germany, September 12-16, 2016 • E. Casalicchio, L.Lundberg, S.Shirinbab (2017), Energy-aware Auto-scaling Algorithms for Cassandra Virtual Data Centers, Cluster Computing, Elsevier (TO APPEAR JUNE 2017). ACROSS  -­‐ Rome  Meeting
  • 12. Motivations • Data storage (or serving systems) are playing an important role in the cloud and big data industry • The management is a challenging task • The complexity increase when multitenancy is considered • Human assisted control is unrealistic • there is a growing demand for autonomic solutions • Our industrial partner Ericsson AB is interested in autonomic management of cassandra ACROSS  -­‐ Rome  Meeting
  • 13. Problem description • We consider a provider of a managed Apache Cassandra service • The applications or tenants of the service are independent and each uses its own Cassandra Virtual Data Center (VDC) • The service provider want to maintain SLAs, that requires • to properly plan the capacity and the configuration of each Cassandra VDC • To dynamically adapt the infrastructure and VDC configuration without disrupting performances • To minimize power consumption. ACROSS  -­‐ Rome  Meeting
  • 14. Solution proposed (1000 ft view) • An energy-aware adaptation model specifically designed for Cassandra VDC running on a cloud infrastructure • Architectural constraints imposed by Cassandra (minimum number of nodes, homogeneity of nodes, replication factor and heap size) • Constraints on throughput and replication factor imposed by the SLA • Power consumption model based on CPU utilization • An adaptation policy of the Cassandra VDC configuration and the cloud infrastructure configuration, that orchestrate three strategies: • Vertical scaling • Horizontal scaling • Optimal placement ACROSS  -­‐ Rome  Meeting
  • 15. Solution proposed (deep details) • A Workload and SLA model • A System architecture model • A throughput model • The utility function and problem formulation • Drawbacks of the optimal solution and alternatives • Experimental results ACROSS  -­‐ Rome  Meeting
  • 16. Workload and SLA model • Workload features • the type of requests, e.g. read only, write only, read & write, scan, or a combination of those • The rate of the operation requests • The size of the dataset • Workload types • CPU bound, Memory bound • The data replication_factor • SLA R, W, RW (75/25) Dataset size Replication factor < 𝑙# , 𝑇# &#' , 𝐷#, 𝑟# > Min Thr ACROSS  -­‐ Rome  Meeting
  • 17. Architecture model • Homogeneous physical machines (H) • VMs of different type and size (V) • A VDC is composed of 𝑛# homogeneous VMs • 𝑛# ≥ 𝐷# • At least 𝐷# vnodes out of 𝑛# must run on different PMs • Datacenter configuration is described by a vector • 𝑥 = 𝑥#,/,0 TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and config. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ⇥103 8.3 ⇥103 13.3 ⇥103 2 4 16 4 8.3 ⇥103 8.3 ⇥103 8.3 ⇥103 3 2 16 4 3.3 ⇥103 3.3 ⇥103 3.3 ⇥103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee In case ri > heapSize constraint ni Di. Con can be defined as ni = j and considering that in heapSizej for all co constraints are modelle X j2J ,h2H xi,j,h X j2J yi,j = 1 X si,h Di j cj mj heapSizej R W RW 1 8 32 8 16.6 ⇥103 8.3 ⇥103 13.3 ⇥103 2 4 16 4 8.3 ⇥103 8.3 ⇥103 8.3 ⇥103 3 2 16 4 3.3 ⇥103 3.3 ⇥103 3.3 ⇥103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee to process the requests from application i. The SLA parameters Di and ri are used to determine the number of vnodes to be instantiated, as discussed in the next section. Concerning Assumption 1, we limit the study to the set L = {R, W, RW} because the model we propose can deal with any type of operation requests, as clarified later in Section IV-C Assumption 4 implies that the service provider have to set up ACROSS  -­‐ Rome  Meeting
  • 18. Architecture model (cont’d) • To make CPU bound a VDC we need 𝑛# ≥ 𝐷# 12 03456#738 , if 𝑟# > ℎ𝑒𝑎𝑝𝑆𝑖𝑧𝑒/ • 𝑦#,/ = 1 if application i use a VM configuration j to run Cassandra vnodes, otherwise 𝑦#,/ = 0 • 𝑠#,0 = 1  if a Cassandra vnode serving application i run of PM h. Otherwise 𝑠#,/ = 0 • Vertical scaling is modelled assuming that is possible to switch from configuration j1 to j2 at runtime ri > heapSizej it holds Eq. 1, otherwise, hold the nt ni Di. Considering that the number ni of vnodes efined as ni = X j2J ,h2H xi,j,h 8i 2 I. (2) sidering that in our industrial case is always ri ej for all configurations j, the above introduced nts are modelled by the following equations: X 2J ,h2H xi,j,h Di · ri heapSizej 8i 2 I (3) X 2J yi,j = 1 8i 2 I (4) X 2H si,h Di 8i 2 I (5) Throughput for different workloads (ops/sec) R W RW 16.6 ⇥103 8.3 ⇥103 13.3 ⇥103 8.3 ⇥103 8.3 ⇥103 8.3 ⇥103 3.3 ⇥103 3.3 ⇥103 3.3 ⇥103 BLE FOR THE DATASET IN A CASSANDRA NCTION OF THE VM MEMORY SIZE. 1 2 4 8 16 32 GB) 0.5 1 1 2 4 8 e service provider must guarantee application i. The SLA parameters mine the number of vnodes to be the next section. 1, we limit the study to the set L = odel we propose can deal with any as clarified later in Section IV-C. he service provider have to set up, oarding phase, and to maintain, at f vnodes for tenant i. Dealing only ni = X j2J ,h2H xi,j,h 8i 2 I. (2) and considering that in our industrial case is always ri heapSizej for all configurations j, the above introduced constraints are modelled by the following equations: X j2J ,h2H xi,j,h Di · ri heapSizej 8i 2 I (3) X j2J yi,j = 1 8i 2 I (4) X h2H si,h Di 8i 2 I (5) where: yi,j is equal to 1 if application i use a VM configuration j to run Cassandra vnodes, otherwise yi,j = 0; si,h is equal to 1 if a Cassandra vnode serving application i run of PM h. Otherwise si,h = 0. Finally, to model vertical scaling actions, that is a change from configuration j1 to j2, we replace the VM of type j1 with a VM of type j2. However, in a real setting, hypervisorsACROSS  -­‐ Rome  Meeting
  • 19. Throughput model • The actual throughput 𝑇# is a function of 𝑥#,/,0 0 2 4 6 8 10 12 Number of nodes (ni ) 0 20 40 60 80 100 Throughputti,j,h (103 ops/sec) R W RW t0 t(5< ni ≤ 8) = t0 · δk l i ,j · (ni -4) + t(ni =4) n i Fig. 2. A real example of Cassandra throughput as function of the number of Cassandra vnodes allocated for different type of requests. The plot shows how the model we propose is realistic. min f(x) = P(x) subject to: X J ,H t(xi,j,h) Tmin i , 8i 2 I X H xi,j,h · yi,j Di · ri heapSizej , 8i 2 I, xi,j,h  · yi,j, 8i 2 I, j 2 J , h 2 H X J yi,j = 1, 8i 2 I X I,J xi,j,h · cj  Ch, 8 h 2 H X i ple, in Figure 2. k li,j is the slope of the kth segment and id for a number of Cassandra vnodes ni between nk 1 nk. Therefore, for nk 1  ni  nk, we can write the wing expression: t(ni) = t(nk 1) + t0 li,j · k li,j · (ni nk 1) (6) e k 1, n0 = 1 and ti,j,h(1) = t0 li,j. nally, for a configuration x of a VDC, and considering ion 2 we define the overall throughput Ti as: Ti(x) = t (ni) , 8i 2 I (7) ower consumption model s service provider utility we chose the power consumption s directly related with the provider revenue (and with IT nability). literature has been proposed many work for reducing tacenter running N independent vector x = [xi,j,h], where xi,j,h nodes serving application i and ation j allocated on PM h, 8i 2 2 H = [1, H] and I, J , H ⇢ s a nominal CPU capacity Ch, ble cores, and a RAM of Mh onfigured with cj virtual cores, mum JVM heap size heapSizej mportant parameter in our case of the data a Cassandra vnodes or fast retrieval and processing. size of the RAM of the heap ummarised in Table II. Hence, C instantiated for application i li,j is valid for a number of Cassandra vnodes n and nk. Therefore, for nk 1  ni  nk, w following expression: t(ni) = t(nk 1) + t0 li,j · k li,j · (ni where k 1, n0 = 1 and ti,j,h(1) = t0 li,j. Finally, for a configuration x of a VDC, Equation 2 we define the overall throughput Ti(x) = t (ni) , 8i 2 I D. Power consumption model As service provider utility we chose the po that is directly related with the provider reven sustainability). TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and config. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ⇥103 8.3 ⇥103 13.3 ⇥103 2 4 16 4 8.3 ⇥103 8.3 ⇥103 8.3 ⇥103 3 2 16 4 3.3 ⇥103 3.3 ⇥103 3.3 ⇥103 In case ri > heapSizej it holds Eq. constraint ni Di. Considering that t can be defined as ni = X j2J ,h2H xi,j,h 8 and considering that in our industria heapSizej for all configurations j, ACROSS  -­‐ Rome  Meeting
  • 20. Energy consumption model • Based on physical node utilization • 𝑃0 &4F = 500𝑊 and 𝑘0 = 0.7 core used (e.g. [7]). In this work we chose a linear model [3] where the power Ph consumed by a physical machine h is a function of the CPU utilization and hence of the system configuration x: Ph(x) = kh · Pmax h + (1 kh) · Pmax h · Uh(x) (8) where Pmax h is the maximum power consumed when the PM h is fully utilised (e.g. 500W), kh is the fraction of power consumed by the idle PM h (e.g. 70%), and the CPU utilisation for PM h is defined by Uh(x) = 1 Ch · X I,J xi,j,h · cj (9) I yi,j, si,h a xi,j,k 2 N, where: t that the SLA all the tenan non linear, b from operat eq. 6. Eq. 1 the number portion of da and that the implemente that for eac is an extrem Ph(x) = kh · Pmax h + (1 kh) · Pmax h · Uh(x) (8) Pmax h is the maximum power consumed when the PM ly utilised (e.g. 500W), kh is the fraction of power ed by the idle PM h (e.g. 70%), and the CPU utilisation h is defined by Uh(x) = 1 Ch · X I,J xi,j,h · cj (9) optimisation problem ntroduced before, the service provider aims to minimise all energy consumption P(x) defined by X where: the set of c that the SLA is satisfie all the tenants. For the s non linear, but they can from operational resear eq. 6. Eq. 12 introduce the number of vnodes a portion of dataset handl and that the replicatio implemented. Equation that for each tenant mu is an extremely large po maximum capacity of t relaxation of such cons In the same way, Eq. for the vnodes do not e physical nodes. Eq. 17ACROSS  -­‐ Rome  Meeting
  • 21. Problem formulation • Linear model • Objective function linear • Constraints linear • Constraints imposed by • SLA on throughput and replication factor • Replication factor (and number of distinct PMs) • Homogeneity of vnodes configuration • Heap size (we want a CPU bound configuration) 0 2 4 6 8 10 12 Number of nodes (ni ) 0 20 40 60 80 100 Throughputti,j,h (103 ops/sec) R W RW t 0 t(5< ni ≤ 8) = t 0 · δ k l i ,j · (ni -4) + t(ni =4) n i Fig. 2. A real example of Cassandra throughput as function of the number of Cassandra vnodes allocated for different type of requests. The plot shows how the model we propose is realistic. on cloud management systems the techniques typically used are: scheduling, placement, migration, and reconfiguration of virtual machines. In the specific the ultimate goal is to optimise the use of resources to reduce power consumption. Optimisation depends on the context, it could means min- imising PM utilisation or to balance the utilisation level of physical machine with the use of network devices for data transfer and storage. Independently from the configuration or adaptation policy adopted all these techniques are based on power and/or energy consumption models. Power consumption models usually define a linear relationship between the amount of power used by a system as function of the CPU utilisation (e.g. [3]–[5]), or processor frequency (e.g. [6]) or number of core used (e.g. [7]). In this work we chose a linear model [3] where the power Ph consumed by a physical machine h is a function of the min f(x) = P(x) subject to: X J ,H t(xi,j,h) Tmin i , 8i 2 I (11) X H xi,j,h · yi,j Di · ri heapSizej , 8i 2 I, j 2 J (12) xi,j,h  · yi,j, 8i 2 I, j 2 J , h 2 H (13) X J yi,j = 1, 8i 2 I (14) X I,J xi,j,h · cj  Ch, 8 h 2 H (15) X I,J xi,j,h · mj  Mh, 8 h 2 H (16) X H si,h Di, 8i 2 I (17) X J xi,j,h si,h ·  0, 8h 2 H (18) X J xi,j,h + si,h  0, 8h 2 H (19) X I si,h rh ·  0, 8h 2 H (20) X I si,h + rh  0, 8h 2 H (21) yi,j, si,h and rh 2 [0, 1], 8i 2 I, j 2 J , h 2 H (22) xi,j,k 2 N, 8i 2 I, j 2 J , h 2 H (23)ACROSS  -­‐ Rome  Meeting
  • 22. Sub-optimal adaptation • LocalOpt • LocalOpt-H • BestFit • BestFit-H 10 0 500 1000 Number of tenants 102 10 4 10 6 10 8 Num.ofIterations V=3 Opt (H=100) BestFit (H=100) LocalOpt (H=100) Opt (H=1000) BestFit (H=1000) LocalOpt (H=1000) 0 500 1000 Number of tenants 102 10 4 106 10 8 V=10 Fig. 3 Number of Iterations for di↵erent values of N, V and H 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ACROSS  -­‐ Rome  Meeting
  • 23. Scenarios • New Service Subscriptions • Dataset size increase • Throughput Increase • Surge in the throughput • Physical node failures ACROSS  -­‐ Rome  Meeting asible solutions. ines 19 - 30 place the vnodes on the PMs minimising umber of PMs used packing as much vnodes as possible PM, of course in the respect of Di constraint. That also mise the energy consumption. The function any(cj⇤  compare cj⇤ with all the element of Ca and it returns f exist at least one element of Ca that is greater or equal cj⇤ . Otherwise, if no PMs satisfy the constraint it returns The same behaviour is valid for any(mj⇤  Ma ). The ion sortDescendent(Ha) sorts the Ha in descending . The function popRR(Ha,Di) extracts, in round-robin , a PM from the first Di in Ha. At Line 28, if there is ore room in the selected PMs the set Ha is sorted again allows to try the allocation for the PMs that have now capacity available). At line 32, if not all the n⇤ i,j⇤ vnodes allocated the empty set is returned because no feasible ions for the allocation. Otherwise, the suboptimal solution is returned. VI. PERFORMANCE EVALUATION METHODOLOGY TABLE III. MODEL PARAMETERS USED IN THE EXPERIMENTS Parameter Value Description N 1 – 10 Number of tenants V 3 Number of VM types H 8 Number of PMs Di 1 – 4 Replication factor for App. i ri 5 - 50 Dataset size for App. i L {R, W, RW } Set of request types T min i 10000 70000 ops/sec Minimum throughput agreed in the SLA Ch 16 Number of cores for PM h cj 2 – 8 Number of vcores used by VM type j Mh 128 GB Memory size of PM h mj 16 – 32 GB Total memory used by VM type j heapSizej 4 – 8 GB Max heap size used by VM type j 8li: 1 li 1 1  xi,j,h  2 2 li 0.8 3  xi,j,h  7 3 li 0.66 xi,j,h 8 P max h 500 Watt Maximum power consumed by PM h if fully loaded kh 0.7 Fraction of P max h consumed by PM h if idle
  • 24. Performance metrics • Total power consumption • Scaling Index • Count for horizontal and vertical scaling • The Migration Index • Count the number of migrations • Delayed Requests • Assume not request timeout • Consistency level reliability • Defined as the probability that the number of healthy replicas in the Cassandra VDC is enough to guarantee a specific level of consistency over a fixed time interval ACROSS  -­‐ Rome  Meeting
  • 25. New service subscriptions Workload • 75%R, 15%W, 10%RW (unif) • Tmin = 10-18Kops/sec (unif) • Di = 2,3 (unif) • Ri = 8GB et size variation scenarios, while for the SLA variation scenario. d on using Matlab R2015b 64- gle Intel Core i5 processor with model parameters we used for ble III. MENTAL RESULTS compare the performance of the lOpt and BestFit heuristics. we increase the number of sub- e SLA parameters are randomly ts summarise the data collected nant subscription. system scale when new tenants mber of vnodes allocated grows enants. The differences are that: alOpt tend to allocate small Number of tenants (N) Fig. 3. New service subscription: number of vnodes allocated by the three adaptation policies 1 2 3 4 5 6 7 8 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy 1 2 3 4 5 6 7 8 LocalOpt 1 2 3 4 5 6 7 8 BestFit Number of tenants (N) Fig. 4. New service subscription: total power consumed by the three adaptation policies 3 4 5 6 tionIndex(MI) ing Monte Carlo simulation for the New service ns and Small dataset size variation scenarios, while valuation is used for the SLA variation scenario. s have been carried on using Matlab R2015b 64- X running on a single Intel Core i5 processor with ain memory. The model parameters we used for are reported in Table III. VII. EXPERIMENTAL RESULTS vice subscription owing experiments compare the performance of the icy with the LocalOpt and BestFit heuristics. th one tenant and we increase the number of sub- to 8. Because the SLA parameters are randomly he following results summarise the data collected ns for each new tenant subscription. 3 shows how the system scale when new tenants he service. The number of vnodes allocated grows h the number of tenants. The differences are that: l policy and LocalOpt tend to allocate small hines (VM type 1 and 2) and rarely select VM the contrary, the BestFit algorithm often uses (VM type 3) and that explains why the minimum 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Number of tenants (N) Fig. 3. New service subscription: number of vnodes allocated by the three adaptation policies 1 2 3 4 5 6 7 8 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy 1 2 3 4 5 6 7 8 LocalOpt 1 2 3 4 5 6 7 8 BestFit Number of tenants (N) Fig. 4. New service subscription: total power consumed by the three adaptation policies 1 2 3 4 5 6 7 Number of tenants 0 1 2 3 4 5 6 MigrationIndex(MI) Results • Optimal policy and LocalOpt • allocate small VMs (type 1 and 2) • rarely select VM type 3 • BestFit algorithm • uses large VMs (VM type 3) • the minimum number of vnodes allocated is lower than the other policies. ACROSS  -­‐ Rome  Meeting
  • 26. Dataset size increase TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and config. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ⇥103 8.3 ⇥103 13.3 ⇥103 2 4 16 4 8.3 ⇥103 8.3 ⇥103 8.3 ⇥103 3 2 16 4 3.3 ⇥103 3.3 ⇥103 3.3 ⇥103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee to process the requests from application i. The SLA parameters Di and ri are used to determine the number of vnodes to be instantiated, as discussed in the next section. Concerning Assumption 1, we limit the study to the set L = {R, W, RW} because the model we propose can deal with any I c c a h c w j t O Workload • 3 tenants ( R, W, RW ) • ri = 10 – 50 GB • Tmin = [14,10,18] Kops/sec • Di = [3,2,3] plot for the power consumed P(x) ht. 40 50 ± %) D2 =2 0 20 30 40 50 r i (8GB ± %) App. 3, D3 =3 ox plot for the number of vnodes 0, Tmin 2 = 14.000 and Tmin 3 = ect the performance of the from a dataset size of 10GB Scali 10 15 20 25 30 35 40 45 50 0 10 20 10 15 20 25 30 35 40 45 50 ri (GByte) 10 15 20 25 30 35 40 45 50 Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the line represents the number of virtual nodes allocated during each experiment 10 15 20 25 30 35 40 45 50 r i (GByte) LocalOpt 10 15 20 25 30 35 40 45 50 BestFit 10 15 20 25 30 35 40 45 50 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy Fig. 12. Dataset size increase: box plot for the power consumed P(x). 4 0 10 0 10 20 30 Scalingindexandnum.ofvnod 10 15 20 25 30 35 40 45 50 0 10 20 30 10 15 20 25 30 35 40 45 50 r i (GByte) 10 15 20 25 30 35 40 45 50 VM type 1 VM type 2 VM type 3 vnodes LocalOpt BestFit Fig. 11. Dataset size increase: bar plot represents the Scaling Index and the line represents the number of virtual nodes allocated during each experiment 10 15 20 25 30 35 40 45 50 r i (GByte) LocalOpt 10 15 20 25 30 35 40 45 50 BestFit 10 15 20 25 30 35 40 45 50 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy Fig. 12. Dataset size increase: box plot for the power consumed P(x). 15 20 25 30 35 40 45 50 ri (GByte) 0 1 2 3 4 MigrationIndexMI ACROSS  -­‐ Rome  Meeting
  • 27. Throughput increase 0 10 20 30 40 50 60 70 T min i × 10 3 ops/sec 10 20 30 40 50 60 70 ndex for the three applications and for the three policies. The line represent the number of VMs or the write ies have the not capable ervice level orkload RW nodes used selects VM back to VM ise, locally, adopts VM o VM type the price of of vnodes migrations. 10 20 30 40 50 60 70 LocalOpt 10 20 30 40 50 60 70 500 1000 1500 2000 2500 3000 3500 4000 PowerconsumptionP(x)(Watt) Optimal policy 10 20 30 40 50 60 70 BestFit Tmin i (× 103 ops/sec.) TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li. THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and config. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ⇥103 8.3 ⇥103 13.3 ⇥103 2 4 16 4 8.3 ⇥103 8.3 ⇥103 8.3 ⇥103 3 2 16 4 3.3 ⇥103 3.3 ⇥103 3.3 ⇥103 TABLE II. MEMORY AVAILABLE FOR THE DATASET IN A CASSANDRA VNODE (JVM HEAP) AS FUNCTION OF THE VM MEMORY SIZE. mj (RAM size in GB) 1 2 4 8 16 32 heapSizej (max Heap size in GB) 0.5 1 1 2 4 8 the minimum throughput the service provider must guarantee to process the requests from application i. The SLA parameters Di and ri are used to determine the number of vnodes to be instantiated, as discussed in the next section. Concerning Assumption 1, we limit the study to the set L = I c c a h c w j Workload • 3 tenants ( R, W, RW ) • Tmin = 10-70 Kops/sec • Di = 3 • Ri = 8GB ACROSS  -­‐ Rome  Meeting
  • 28. Throughput increase (cont’d) -10 0 10 ScalingIndex(SI)andnum.ofvnodes App. 1 (R) App. 2 (W) App. 3 (RW) -10 0 10 10 20 30 40 50 60 70 -10 0 10 10 20 30 40 50 60 70 Tmin i × 103 ops/sec 10 20 30 40 50 60 70 VM type 1 VM type 2 VM type 3 vnodes LocalOpt BestFit Optimal policy ut increase: Box plot represent the scaling Index for the three applications and for the three policies. The line represent the number of VMs ch experiment higher power consumption. Also for the write of vnodes migrations. TABLE I. t0 li,j AS FUNCTION OF cj (VIRTUAL CPU), mj (GBYTE), heapSizej (GBYTE) AND li . THE THROUGHPUT IS MEASURED IN OPERATIONS/SECOND (OPS/SEC) VM type and config. Throughput for different workloads (ops/sec) j cj mj heapSizej R W RW 1 8 32 8 16.6 ⇥103 8.3 ⇥103 13.3 ⇥103 2 4 16 4 8.3 ⇥103 8.3 ⇥103 8.3 ⇥103 3 2 16 4 3.3 ⇥103 3.3 ⇥103 3.3 ⇥103 In case ri > heapSizej it holds Eq constraint ni Di. Considering that t can be defined as ni = X j2J ,h2H xi,j,h and considering that in our industri 2 but return immediately to VM type 1. That at the price of an higher energy consumption. From the optimal policy behaviour we learned that is better to allocate always smaller virtual machines. That choice usually allow to satisfy both the dataset and throughput con- straints minimising the actual throughput provided, that for large dataset can be higher than Tmin i . The power consumption is plotted in Figure 7. The LocalOpt outperforms the BestFit, specifically for low throughput. The penalty payed is between 15 and 25% if LocalOpt is used and between the 13 and 50% for the BestFit. The higher loss is observed for low values of the throughput. Figure 8 shows the Migration Index. Each box plot is computed over the data collected for the three tenants. We can observe that each application experiments between 0 and 3 vnode migrations depending on the infrastructure load state. The case Tmin i = 50.000 ops/sec. is an exception because it corresponds to the vertical scaling action taken for App. 1. The decrease in the number of vnodes used impacts also the value T min i (× 10 3 ops/sec.) Fig. 7. Throughput increase: the power consumed P(x) by the three adaptation policies. 20 30 40 50 60 70 T min i × 10 3 ops/sec 0 1 2 3 MigrationIndex(MI) Fig. 8. Throughput increase: Migration Index for the optimal policy. LocalOpt and BestFit have a MI equal to zero by definition. C. SLA variation: small dataset size variations Dataset size could heavily impact the number of vnodes used in a VDC (see Eq. 1). In this experiment we assess if ACROSS  -­‐ Rome  Meeting
  • 29. Throughput increase (cont’d)Auto-scaling Algorithms for Cassandra Virtual Data Centers 13 20 30 40 50 60 70 -10 -5 0 5 10 ScalingIndex(SI) Opt 20 30 40 50 60 70 -10 -5 0 5 10 LocalOpt 20 30 40 50 60 70 Throughput Ti min (× 103 tps) -10 -5 0 5 10 LocalOpt-H 20 30 40 50 60 70 -10 -5 0 5 10 BestFit 20 30 40 50 60 70 -10 -5 0 5 10 BestFit-H VM type 1 VM type 2 VM type 3 ghput increase: The box represent the scaling Index actions for the RW workload and for the five policies he adaptation policy switches between two ations (vertical scaling). The negative bar M type dismissed and the positive for the e allocated. Observations with only pos- respond to horizontal scaling adaptation example, for the Optimal policy there is m VM type 3 (yellow bar) to VM type 2 r the observation Tmin i = 30.000 ops/sec. between 0 and 3 vnode migrations depending on the infrastructure load state. Considering the low values for the migration index for the Opt allocation and the high saving in the en- ergy consumed compared with the other algorithms, it makes sense to perform periodic VDC consolidation us- ing the Opt policy, as recommended in Section 7. is for the VM type dismissed and the positive for the new VM type allocated. Observations with only pos- itive bars correspond to horizontal scaling adaptation actions. For example, for the Optimal policy there is a change from VM type 3 (yellow bar) to VM type 2 (green bar) for the observation Tmin i = 30.000 ops/sec. The number of new allocated VMs is smaller because each new VM o↵ers a higher throughput. The optimal adaptation policy always starts allocating VMs of Type 3 (cf. Tab. 1) and, if needed progressively moves to more powerful VM types. The Opt policy performs only one vertical scaling and when the VM type if changed from type 3 to type 2; after that it always does horizon- tal scaling actions (this is a particularly lucky case). The two heuristics LocalOpt and BestFit show a very unstable behaviour performing both vertical and hor- izontal scaling. Both first scale to VM type 1 from VM type 3 and then they scale back to VM type 2. When the variant of the above algorithm is used, that is LocalOpt-H and BestFit-H respectively, the VM type is fixed to type 1 and the only action taken is horizontal scaling. The power consumption is plotted in Figure 6. For throughput higher than 40⇥103 ops/sec, with the opti- mal scaling is possible to save about 50% of the energy consumed by the heuristic allocation. For low values of the throughput (10 20⇥103 ops/sec) the BestFit and BestFit-H show a very high energy consumption com- Considering the low values for the migration index for the Opt allocation and the high saving in the en- ergy consumed compared with the other algorithms, it makes sense to perform periodic VDC consolidation us- ing the Opt policy, as recommended in Section 7. 10 20 30 40 50 60 70 Throughput T i min (× 103 tps) 1000 1500 2000 2500 3000 3500 4000 PowerconsumedP(x)(Watt) Opt LocalOpt LocalOpt-H BestFit BestFit-H Fig. 6 Throughput increase: the power consumed P(x) by the five adaptation policies when increasing the throughput for Application 3 (RW workload). 9.2 Throughput surge In this set of experiments we analyse how fast the scal- ing is, with respect to the throughput variation rate, and what is the number of delayed requests. We assume 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 ACROSS  -­‐ Rome  Meeting Auto-scaling Algorithms for Cassandra Virtual Data Centers 11 of the auto-scaling algorithms Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H Capacity planning X Data center consolidation X VDC consolidation X X run-time adaptation X X X X
  • 30. Surge in the throughput (RW)Emiliano Casalicchio et al. 0 2 4 6 8 10 12 14 16 18 20 22 Time (minutes) 0 10 20 30 40 50 60 70 80 Throughput(×10 3 ops/sec) ThrRW min Opt (Case A, actual thr) BestFit (Case A, actual thr) Opt (Case B, actual thr) BestFit (Case B, actual thr) Fig. 8 Auto-scaling actions in case of a throughput surge: Case A and Case B ). Than, at time t = 10, twelve vnodes are al- Considering the serialization of the horizontal actions (cf. Section 7) the seven Cassandra vn- e added in 14 minutes. The LocalOpt behaves Opt in terms of scaling decisions. The BestFit aling start allocationg 4vnodes of Type 3, than up to seven vnodes (at time t = 8) and finally wo vertical scaling actions: the first from vnodes to Type 2, and the second from Type 2to Type s. number of delayed requests Qi and the percent- h respect the total number of received requests ) are reported in table 5. Qi and tot.req. are ed over the time interval the requested through- n W exceed the actual throughput. itively, with Cassandra vnodes capable to han- gher workload it should be possible to better the surge in the throughput. Hence, we have an- Case B where we configure three new types of dra vnodes capable to handle the following RW put: type 4, 20 ⇥ 103 ops/sec.; type 5, 15 ⇥ 103 vnodes type capable to handle from low throughput to very high throughput allow to manage throughput surges. Table 5 The number of delayed requests Qi and the per- centage with respect the total number of received requests (tot.req.). Qi and tot.req. are computed over the time in- terval the requested throughput (Tmin RW ) exceed the actual throughput. Case A Qi (⇥103 ) Qi tot.req. (%) Opt 191.84 22.78 LocalOpt 191.84 22.78 BestFit 70.89 46.33 Case B Opt 7.66 4 LocalOpt 7.66 4 BestFit 70.58 30.29 Case  B T4,  20  × 103ops/sec.   T5,  15  × 103ops/sec. T6,  7  × 103  ops/sec.   Case  A T1,  13.3x103 ops/sec T2,  8.3x103 ops/sec T3,  3.3x103  ops/sec ACROSS  -­‐ Rome  Meeting
  • 31. Physical node failure Consistency  level  reliability  R  defined  as  the  probability  that  the  number  of  healthy  replicas  in  the   Cassandra  VDC  is  enough  to  guarantee  a  specific  level  of  consistency  over  a  fixed  time  interval   Consistency  level  ONE  and  QUORUM  (Q=                                    ) rithms are applied. In case the Cassandra VDC has a number of physi- cal nodes H equal to the number of vnodes n, and there is a one-to-one mapping between vnodes and physical nodes, the consistency level of ONE is guaranteed if one replica is up. Hence, the Consistence reliability is the probability that at least one vnode is up and a replica is on that node: RO = 1 D n ⇥ (1 ⇢)n (24) where: ⇢ is the resiliency of a physical node, and D n is the probability that a replica is on a Cassandra vnode when the data replication strategy used is the SimpleStrategy (cf. the Datastax documentation for Cassandra). In the same way, we can define the reliability of the Cassandra VDC to guarantee a consistency level of QUORUM as the probability that at least Q vnodes are up and that Q replicas are on them: RQ = 1 D ⇥ (1 ⇢)n Q+1 . (25) physical nodes is unknown puted the values of KO and the value for KO is equa nodes used, the values for allocation and on the node allocation, we could have where the max{K1 Q, K2 Q, . and the min{K1 Q, K2 Q, ...} r example, if 8 vnodes are d following way {1, 1, 2, 1, 3} n Q + 1 = 7, KO = 5, K Table 6 reports the val and D = 5, ⇢ = 0.9. Th following way. We consid for a randomly generated S RW, 15%W and 75% R; T in the interval [10.000, 18.0 ri is constant (8GB). The each tenant is n = 6 for th the case D = 5. We run 1 the best and worst case ov In the first set of expe 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 RO = 1 D n ⇥ (1 ⇢)n (24) where: ⇢ is the resiliency of a physical node, and D n is the probability that a replica is on a Cassandra vnode when the data replication strategy used is the SimpleStrategy (cf. the Datastax documentation for Cassandra). In the same way, we can define the reliability of the Cassandra VDC to guarantee a consistency level of QUORUM as the probability that at least Q vnodes are up and that Q replicas are on them: RQ = 1 D n ⇥ (1 ⇢)n Q+1 . (25) Table 6 shows the values of RO and RQ for D = 3 and 5 and for ⇢ = 0.9 and ⇢ = 0.8. In a managed Cassandra data center, a Cassandra VDC is rarely allocated using a one-to-one mapping of vnodes on physical nodes. The resource management policies adopted by the provider usually end-up with a many-to-one mapping, that is h physical nodes run n Cassandra vnodes: D  h < n. In that case we can following way {1, 1, 2, 1, 3} n Q + 1 = 7, KO = 5, K Table 6 reports the valu and D = 5, ⇢ = 0.9. Th following way. We conside for a randomly generated S RW, 15%W and 75% R; T in the interval [10.000, 18.0 ri is constant (8GB). The each tenant is n = 6 for th the case D = 5. We run 10 the best and worst case ov In the first set of exper The one-to-one mapping ONE and QUORUM with 9s and five 9s respectively tion factor increase to 5 th 9s and eight 9s for consis respectively. Unfortunately, when a the reliability of the consi orders of magnitude. In th 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 DB DB DB VM VM VM PM PM PM DB DB DB VM VM VM PM PM of the Opt and BestFit auto-scaling al- LocalOpt behaves as the Opt. From the ent that with more powerful vnodes the gorithms are capable to satisfy the re- hput with a delay of only 2 minutes. The ating 3 vnodes of type 6, at time t = 4 type 6 is added and at time t = 6 the a vertical scaling allocating 4 vnodes of Cassandra o↵ers three main levels of consistency (both for Read and Write): ONE, QUORUM and ALL. Con- sistency level of ONE means that only one replica node is required to reply correctly, that is it contains the replica of the portion of the dataset needed to answer the query. Consistency level QUORUM means that Q =⌅D 2 ⇧ + 1 replicas nodes are available to reply correctly Q replicas are on them: RQ = 1 D n ⇥ (1 ⇢)n Q+1 . (25) Table 6 shows the values of RO and RQ for D = 3 and 5 and for ⇢ = 0.9 and ⇢ = 0.8. In a managed Cassandra data center, a Cassandra VDC is rarely allocated using a one-to-one mapping of vnodes on physical nodes. The resource management policies adopted by the provider usually end-up with a many-to-one mapping, that is h physical nodes run n Cassandra vnodes: D  h < n. In that case we can generalise equations 24 and 25 to the following: RO = 1 D n ⇥ (1 r)KO (26) the case D = 5. We the best and worst ca In the first set of The one-to-one map ONE and QUORUM 9s and five 9s respec tion factor increase t 9s and eight 9s for respectively. Unfortunately, wh the reliability of the orders of magnitude. that all the auto-scali ONE with a reliabilit tor increases to 5 the 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 for Cassandra Virtual Data Centers 15 the consistency level of ONE and QUORUM. The probability that a data replica is on = 5. We assume the reliability of a physical node is ⇢ = 0.9 n-to-one to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H 99995 0.9995 0.9995 99995 0.995 – 0.9995 0.9995 9999995 0.999995 0.999995 999995 0.9995 – 0.999995 0.99995 9996 0.996 0.996 9984 0.98 – 0.996 0.996 999948 0.99984 0.99984 99987 0.996 – 0.99984 0.9992 ). Consistency level e available. Consistency level re- f ONE and of QUO- RQ = 1 D n ⇥ (1 r)KQ . (27) where: KO is the number of failed physical nodes that causes a failure of n vnodes; and KQ is the number of KO is  the  number  of  failed  physical  nodes  that  causes  a   failure  of  n  vnodes; KQ is  the  number  of  failed  physical  nodes  that  causes  a   failure  of    (n  −  Q  +  1)  vnodes.   ACROSS  -­‐ Rome  Meeting
  • 32. Physical node failure (cont’d)ware Auto-scaling Algorithms for Cassandra Virtual Data Centers Consistency reliability R for the consistency level of ONE and QUORUM. The probability that a data repli s 0.5 for both D = 3 and D = 5. We assume the reliability of a physical node is ⇢ = 0.9 n-to-one ⇢ = 0.9 one-to-one Opt LocalOpt LocalOpt-H BestFit BestFit-H RO|D=3 0.9999995 0.9995 0.9995 RQ|D=3 0.999995 0.995 – 0.9995 0.9995 RO|D=5 0.99999999995 0.999995 0.999995 RQ|D=5 0.999999995 0.9995 – 0.999995 0.99995 ⇢ = 0.8 RO|D=3 0.99996 0.996 0.996 RQ|D=3 0.99984 0.98 – 0.996 0.996 RO|D=5 0.999999948 0.99984 0.99984 RQ|D=5 0.9999987 0.996 – 0.99984 0.9992 D is the replication factor). Consistency level ans that all the replicas are available. RQ = 1 D n ⇥ (1 r)KQ .ACROSS  -­‐ Rome  Meeting
  • 33. Lesson learned • When you have to deal with a specific technology there are many constraint to be considered • The multi-layer adaptation is a must • Not a single policies fit all the workload • Not all the policies fit all the application life cycle aware Auto-scaling Algorithms for Cassandra Virtual Data Centers 3 Use of the auto-scaling algorithms Use case Opt LocalOpt BestFit LocalOpt-H BestFit-H Capacity planning X Data center consolidation X VDC consolidation X X run-time adaptation X X X X ACROSS  -­‐ Rome  Meeting