Reducing Tail Latency in Cassandra Using Regression

奈良先端科学技術大学院大学無限の可能性、ここが最先端－Outgrow your limits－
Master Thesis Presentation
Internet Architecture and Systems Laboratory
Reducing Tail Latency In Cassandra Cluster Using Regression
Based Replica Selection Algorithm
Chauque Euclides

Outline
1. Background
2. Tail Latency
3. Replica Selection
4. Proposed Approach
4.1. Linear Regression Based Replica Selection
4.2. Predicting Query Execution Time
4.3. Training Data Generation
4.4. Model Training
4.5. Experimental Results
4.6. Comparison With the Heron
5. Summary
6. Future work
2

1. Background
u For business oriented applications, fast and predictable response times are critical for a good
user experience.
u A study conducted by Amazon and Google [1], where a controlled delay was added on every query
before sending back results to the user, found that:
u An extra delay of 500ms per query resulted in a 1.2% loss of revenue.
u Bounce probability in a website increases the longer the website takes to load.
3
[1] https://www.gigaspaces.com/blog/ amazon-found-every-100ms-of-latency-cost-them-1-in-sales/

2. Tail Latency
• It is challenging to consistently deliver fast response time, since applications are generally multi-
tiered, where serving a single end-user request may involve contacting multiple servers
l Causes of Latency can be attributed to Server Performance Variability, due to: Queuing,
Shared Resources, Background Demons
4

3. Replica Selection [1/3]
u Looking into the causes of the tail latency, it follows that it is infeasible to
eliminate all latency variability.
u However some approaches were developed to reduce its impact, these
approaches rely on standard techniques including:
u Giving preferential resource allocations or guarantees;
u Reissuing requests;
uTrading off completeness for latency;
5

3. Replica selection [2/3]
u A recurring pattern to reducing tail latency is to take advantage of the redundancy built into each tier of the
application architecture.
u Replica selection strategies can help reducing tail latency when the performance of the servers differ.
u A request can be directed to the presumably best replica, i.e. the one that is expected to serve the
request with the smallest latency.
u Ideal Replica Selection Proprieties
u Replica selection needs to quickly adapt to changing system dynamics.
u Must avoid entering oscillating instabilities.
u Should not be computationally costly, nor require significant coordination overheads
6

3. Replica selection [3/3]
7
l Jaiman, et al. Heron: Taming Tail Latencies in
Key-Value Stores under Heterogeneous
Workloads, 37th ISRDS, IEEE 2018.
- Takes into consideration the size of the
values associated with keys.
- The algorithm uses Bloom filters to keep track
of keys associated with large values.
- Whenever a replica a processing a request for
a large value, it is marked as busy.
- As the amount of data in the datastore
increases, the bloom filter cannot be expanded
without loosing previous mapping.
l Suresh et al. C3: Cutting Tail Latency in Cloud
Data Stores via Adaptive Replica Selection,
NSDI’15, USENIX 2015
- The Algorithm consists of a replica ranking
algorithm and a rate control and
backpressure algorithm;
- It ranks the the servers, taking into account
server side queue, and service time.
- An incoming request is sent to a server with
the minimum expected service time.

8
4. Proposed Approach
Linear Regression Based Replica Selection

4.1. Linear Regression Based Replica Selection
9
l Previous approaches do not support aggregation
queries
l Query duration is inferred based on the size of the
value requested and not on real estimates
l In my research I explore a different approach, using a
regression model to predict query duration;
l And focus on reducing the tail latency above p999

4.2. Predicting Query Execution Time
10

4.3. Training Data Generation [1/2]
u For data collection, 3 tables for from the TPC-H benchmark were loaded into a Cassandra
cluster;
u 8 servers were used, a replication factor was set to 3.
u Subsequently, locust, was used to issue requests, using the chosen subset of TPC-H queries,
to simulate user requests;
u The response time values for different percentiles were recorded for each request.
u The same process was repeated, with different number of simulated users to simulate an
increased load.
11

4.3. Training Data Generation [2/2]
12
Ø The queries show different response time
behavior;
Ø The queries with longer response time show a
greater variation of response time as the load
is increased.

4.4. Model Training [1/2]
u To keep prediction overhead low, based on [1] a Linear Regression was chosen to fit the
data.
u For each query template data, a regression model was fit.
u As the evaluation method for the regressors the R Squared was used:
u The R squared is the percentage of the dependent variable variation that a linear model
explains
[1] https://scikit-learn.org/0.16/modules/computational_performance.html
13

4.4. Model Training [2/2]

4.5. Results: Homogeneous Servers [1/3]
15
l Homogeneous Servers
- Figures below shows the tail latency values p999 and p99999 for each query;
- Overall latency is improved.

4.5. Results Homogeneous Servers [2/3]
16
u Comparison of p50, p90 and p999.
u Higher percentile latency (99.9%) is improved,
however the 90% percentile is degraded.

4.5. Results Homogeneous Servers [3/3]
17
u Throughput Comparison

4.5. Results Heterogeneous servers [1/3]
18
l Up to 2 seconds delay was introduced into 4 servers response to simulate an
environment with servers with different processing capabilities.
l Figures below shows the tail latency values p999 and p99999 for each query

19
u Comparison of p50, p90 and p999.
u Higher percentile latency p999 and p99999 are improved, however the p50 percentile is degraded.

20
u Throughput comparison for a cluster with heterogeneous servers

4.6. Comparison with Heron
21
Ø Colloquium B comment by Professor Keiichi Yasumuto. Proposed approach relation with previous work
Ø P999 Response time for all queries, and p999 aggregate comparison between proposed method and heron.

5. Summary
Ø In the present work the tail latency problem was reviewed, and the problem of server
selection was considered as a method to reduce tail latency..
Ø Previous work had been based in a simpler queries, thus is no longer suitable for the
complex queries that came to be supported in Cassandra, this served as motivation for
exploring a new approach for server selection using a regression model to model the
interaction between the queries.
Ø This new approach proved to be successful in reducing tail latency, while preserving the
throughput, however it affected negatively the lower percentiles..
22

6. Future Work
Ø A still remaining point to explore is the use of more evolved machine learning
models, to see if the excessive overhead assumption holds true or not.
Ø And also, experiment with an even greater number of servers
23

24
End

Models Computational Performance
Ø Prediction Latency
l Sklearn benchmark for different models prediction latency
Ø Prediction Throughput
l Sklearn benchmark for different models prediction throughput
https://scikit-learn.org/0.16/modules/computational_performance.html
25

Reducing Tail Latency in Cassandra Using Regression

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (6)

Similaire à Reducing Tail Latency in Cassandra Using Regression

Similaire à Reducing Tail Latency in Cassandra Using Regression (20)

Plus de inet-lab

Plus de inet-lab (6)

Dernier

Dernier (20)

Reducing Tail Latency in Cassandra Using Regression