Predictive Elastic Database Systems

Predictive Elastic
Database Systems
May 24th, 2018
1
Rebecca Taft – becca@cockroachlabs.com

Transaction Performance
Makes or Breaks a Business
Online Retail
Financial Applications
Internet of Things
Social Media
….
2

Challenges to transaction performance:
skew and workload variation
3
Database Elasticity
E-Store – manage skew and react to variation
P-Store – predictive modeling for time variation

Data Partition 1 Data Partition 2 Data Partition 3
Router
4

Router
Query:
Get cart_id 758
4

Router Hash(758) → Partition 1
4

Router
Query:
Add “Harry Potter”
to cart_id 534
5

Router Hash(534) → Partition 3
5

TransacRons
0
1
3
4
5
ParRRon 1 ParRRon 2 ParRRon 3
Router
Uniform
Workload
Transaction Arrival Rate
6

But many real database workloads are
skewed, not uniform
7

TransacRons
0
3
5
8
10
Router
Skewed
Workload
10

And real database workloads are variable
11

TransacRons
0
1
3
4
5
Router
Uniform
Workload
15

TransacRons
0
3
5
8
10
Router
Uniform
Workload, 
Increased Load
18

TransacRons
0
3
5
8
10
Part 1 Part 2 Part 3
E-Store: Elastic Scaling to Adapt to Workload
19
Uniform Workload, Increasing Load

TransacRons
0
3
5
8
10
Part 1 Part 2 Part 3 Part 4 Part 5 19

TransacRons
0
2.5
5
7.5
10
Part 1 Part 2 Part 3 Part 4 Part 5
19

TransacRons
0
2.5
5
7.5
10
TransacRons
0
3
5
8
10
19
Uniform Workload, Increasing Load Skewed Workload

TransacRons
0
3
5
8
10
TransacRons
0
2.5
5
7.5
10
Transaction Arrival Rate Transaction Arrival Rate
19
Uniform Workload, Increasing Load Skewed Workload

E-Store Elastic DBMS Architecture
20
Live Migration
Elastic Controller
Planner
Load
Monitor
Shared Nothing DBMS (e.g. H-Store)

3-Day Online Retail Database Workload
22

Elastic Scaling Adapts to Workload
23

Reactive Scaling Causes Latency Spikes
23

P-Store 
A Predictive Elastic DBMS
24

P-Store Architecture
25
Live
Migration
Predictive Controller
Predictor Planner
Load
Monitor
Shared Nothing DBMS (e.g. H-Store)

Ideal Capacity
Actual Servers
Allocated
Demand
Machine Capacity
26

Options
1. Add machine(s)
Demand
Machine Capacity
28

Options
1. Add machine(s)
2. Remove machine(s)
Demand
Machine Capacity
28

Options
1. Add machine(s)
3. No change
Demand
Machine Capacity
28

Options
1. Add machine(s)
3. No change
Demand
Machine Capacity
Effective Capacity
28

Options
1. Add machine(s)
3. No change
Challenges
1. Time to Reconfigure
2. Effective Capacity
3. Cost
Demand
Machine Capacity
Effective Capacity
28

Time to Reconfigure
• Rate control: small chunks of data, spaced apart
• ∃ some time D – minimum time to move entire database w/ no
performance impact
% of Data
0
50
100
Nodes
1 2 3 4
% of Data
0
50
100
Nodes
1 2 3
→
29

Time to Reconfigure
• Rate control: small chunks of data, spaced apart
• ∃ some time D – minimum time to move entire database w/ no
performance impact
% of Data
0
50
100
Nodes
1 2 3 4
% of Data
0
50
100
Nodes
1 2 3
→
Time = ¼ * D 29

Effective Capacity
30

• What transaction rate can the system support?
• Max rate per machine: Q

Example: Scale from 1 to 6 nodes
34

Demand
Machine Capacity
Effective Capacity
4 machines 
at t = 9
Goal: find path with  
minimal cost such that 
eff. capacity > demand
Cost of {path to A machines
ending at time t} 
= 
Cost of {path to B machines
ending at time t – TimeB→A} +
CostB→A 35

Demand
Machine Capacity
Effective Capacity
Move:
3→4 machines
Ending at t = 9
36

Demand
Machine Capacity
Effective Capacity
T3→4 = 3
Move:
3→4 machines
Ending at t = 9
36

Demand
Machine Capacity
Effective Capacity
T3→4 = 3
C3→4 = 12
Move:
3→4 machines
Ending at t = 9
36

Demand
Machine Capacity
Effective Capacity
T3→4 = 3
C3→4 = 12
C3→4 = ∞
Move:
3→4 machines
Ending at t = 9
36

Demand
Machine Capacity
Effective Capacity
Move:
4→4 machines
Ending at t = 9
37

Demand
Machine Capacity
Effective Capacity
T4→4 = 1
Move:
4→4 machines
Ending at t = 9
37

Demand
Machine Capacity
Effective Capacity
T4→4 = 1
C4→4 = 4
Move:
4→4 machines
Ending at t = 9
37

Demand
Machine Capacity
Effective Capacity
T4→4 = 1
C4→4 = 4 ✔
Move:
4→4 machines
Ending at t = 9
37

Demand
Machine Capacity
Effective Capacity
Move:
3→4 machines
Ending at t = 8
38

Demand
Machine Capacity
Effective Capacity
T3→4 = 3
Move:
3→4 machines
Ending at t = 8
38

Demand
Machine Capacity
Effective Capacity
T3→4 = 3
C3→4 = 12
Move:
3→4 machines
Ending at t = 8
38

Demand
Machine Capacity
Effective Capacity
T3→4 = 3
C3→4 = 12
Move:
3→4 machines
Ending at t = 8
✔
38

Demand
Machine Capacity
Effective Capacity
39

Demand
Machine Capacity
Effective Capacity
Feasible Solution 
Total Cost: 30
39

Demand
Machine Capacity
Effective Capacity
Move:
4→4 machines
Ending at t = 8
40

Demand
Machine Capacity
Effective Capacity
T4→4 = 1
Move:
4→4 machines
Ending at t = 8
40

Demand
Machine Capacity
Effective Capacity
T4→4 = 1
C4→4 = 4
Move:
4→4 machines
Ending at t = 8
40

Demand
Machine Capacity
Effective Capacity
T4→4 = 1
C4→4 = 4 ✔
Move:
4→4 machines
Ending at t = 8
40

Demand
Machine Capacity
Effective Capacity
T4→4 = 1
C4→4 = 4 ✔
Move:
4→4 machines
Ending at t = 8
And So On….
41

Dynamic Programming Algorithm for Planning
Reconfigurations
• Complexity ∝ # time steps and max number of machines needed
• In practice: runtime < 1 second
• Greedy alternatives don’t work
42

Time Series Prediction
• Diurnal, periodic workload, some variation day to day
• Prediction must complete in seconds-to-minutes
• We use Sparse Periodic Auto-Regression (SPAR) – Chen et al. NSDI ‘08
43

Evaluation
• Can we reduce resource usage?
• Can we prevent latency spikes?
44

Experiments
• 3 Days of B2W workload run on H-Store
• Cluster of 10 servers, 6 partitions per server
• About 1 GB of shopping cart and checkout data
• Track machines allocated, throughput, latency, reconfiguration time
periods
45

Results – Static, Peak Provisioning 
Machines Used: 10
46

Results – Static, Average Provisioning 
Machines Used: 4
47

48
Results – Reactive Scaling 
Avg. Machines Used: 4.02

Results – P-Store 
Avg. Machines Used: 5.05
49

Results Comparison: CDF of Top 1% of Latency
50

Summary: P-Store Evaluation
• Can we reduce resource usage?
• Saves 50% of computing resources compared to static allocation
• Can we prevent latency spikes?
• Superior performance compared to reactive approach
• On a real workload, P-Store reduces resource usage while keeping
latency within application requirements
51

Simulation
52
• 4.5 months of B2W data
• Test many hypothetical allocation strategies + parameters

Summary
❖ Real database workloads are skewed and vary over time
❖ Elasticity enables management of skew and adaptation to load
changes
❖ Predictive scaling improves performance during load changes
compared to reactive scaling
Rebecca Taft
becca@cockroachlabs.com
54

Predictive Elastic Database Systems

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Predictive Elastic Database Systems

Similaire à Predictive Elastic Database Systems (20)

Plus de J On The Beach

Plus de J On The Beach (20)

Dernier

Dernier (20)

Predictive Elastic Database Systems