This document summarizes a presentation about reducing tail latency in Cassandra clusters using a regression-based replica selection algorithm. It discusses how tail latency occurs in distributed systems and how previous approaches used replica selection to reduce it. The proposed approach uses linear regression models to predict query execution times and select the fastest replica. Experimental results on homogeneous and heterogeneous server clusters show the approach reduces tail latency metrics like p999 while maintaining throughput. However, it degrades some lower percentile metrics. Future work could explore more advanced machine learning models.
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
Reducing Tail Latency in Cassandra Using Regression
1. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
Master Thesis Presentation
Internet Architecture and Systems Laboratory
Reducing Tail Latency In Cassandra Cluster Using Regression
Based Replica Selection Algorithm
Chauque Euclides
2. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
Outline
1. Background
2. Tail Latency
3. Replica Selection
4. Proposed Approach
4.1. Linear Regression Based Replica Selection
4.2. Predicting Query Execution Time
4.3. Training Data Generation
4.4. Model Training
4.5. Experimental Results
4.6. Comparison With the Heron
5. Summary
6. Future work
2
3. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
1. Background
u For business oriented applications, fast and predictable response times are critical for a good
user experience.
u A study conducted by Amazon and Google [1], where a controlled delay was added on every query
before sending back results to the user, found that:
u An extra delay of 500ms per query resulted in a 1.2% loss of revenue.
u Bounce probability in a website increases the longer the website takes to load.
3
[1] https://www.gigaspaces.com/blog/ amazon-found-every-100ms-of-latency-cost-them-1-in-sales/
4. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
2. Tail Latency
• It is challenging to consistently deliver fast response time, since applications are generally multi-
tiered, where serving a single end-user request may involve contacting multiple servers
l Causes of Latency can be attributed to Server Performance Variability, due to: Queuing,
Shared Resources, Background Demons
4
5. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
3. Replica Selection [1/3]
u Looking into the causes of the tail latency, it follows that it is infeasible to
eliminate all latency variability.
u However some approaches were developed to reduce its impact, these
approaches rely on standard techniques including:
u Giving preferential resource allocations or guarantees;
u Reissuing requests;
uTrading off completeness for latency;
5
6. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
3. Replica selection [2/3]
u A recurring pattern to reducing tail latency is to take advantage of the redundancy built into each tier of the
application architecture.
u Replica selection strategies can help reducing tail latency when the performance of the servers differ.
u A request can be directed to the presumably best replica, i.e. the one that is expected to serve the
request with the smallest latency.
u Ideal Replica Selection Proprieties
u Replica selection needs to quickly adapt to changing system dynamics.
u Must avoid entering oscillating instabilities.
u Should not be computationally costly, nor require significant coordination overheads
6
7. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
3. Replica selection [3/3]
7
l Jaiman, et al. Heron: Taming Tail Latencies in
Key-Value Stores under Heterogeneous
Workloads, 37th ISRDS, IEEE 2018.
- Takes into consideration the size of the
values associated with keys.
- The algorithm uses Bloom filters to keep track
of keys associated with large values.
- Whenever a replica a processing a request for
a large value, it is marked as busy.
- As the amount of data in the datastore
increases, the bloom filter cannot be expanded
without loosing previous mapping.
l Suresh et al. C3: Cutting Tail Latency in Cloud
Data Stores via Adaptive Replica Selection,
NSDI’15, USENIX 2015
- The Algorithm consists of a replica ranking
algorithm and a rate control and
backpressure algorithm;
- It ranks the the servers, taking into account
server side queue, and service time.
- An incoming request is sent to a server with
the minimum expected service time.
9. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.1. Linear Regression Based Replica Selection
9
l Previous approaches do not support aggregation
queries
l Query duration is inferred based on the size of the
value requested and not on real estimates
l In my research I explore a different approach, using a
regression model to predict query duration;
l And focus on reducing the tail latency above p999
11. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.3. Training Data Generation [1/2]
u For data collection, 3 tables for from the TPC-H benchmark were loaded into a Cassandra
cluster;
u 8 servers were used, a replication factor was set to 3.
u Subsequently, locust, was used to issue requests, using the chosen subset of TPC-H queries,
to simulate user requests;
u The response time values for different percentiles were recorded for each request.
u The same process was repeated, with different number of simulated users to simulate an
increased load.
11
12. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.3. Training Data Generation [2/2]
12
Ø The queries show different response time
behavior;
Ø The queries with longer response time show a
greater variation of response time as the load
is increased.
13. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.4. Model Training [1/2]
u To keep prediction overhead low, based on [1] a Linear Regression was chosen to fit the
data.
u For each query template data, a regression model was fit.
u As the evaluation method for the regressors the R Squared was used:
u The R squared is the percentage of the dependent variable variation that a linear model
explains
[1] https://scikit-learn.org/0.16/modules/computational_performance.html
13
15. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results: Homogeneous Servers [1/3]
15
l Homogeneous Servers
- Figures below shows the tail latency values p999 and p99999 for each query;
- Overall latency is improved.
16. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Homogeneous Servers [2/3]
16
u Comparison of p50, p90 and p999.
u Higher percentile latency (99.9%) is improved,
however the 90% percentile is degraded.
18. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Heterogeneous servers [1/3]
18
l Up to 2 seconds delay was introduced into 4 servers response to simulate an
environment with servers with different processing capabilities.
l Figures below shows the tail latency values p999 and p99999 for each query
19. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Heterogeneous servers [2/3]
19
u Comparison of p50, p90 and p999.
u Higher percentile latency p999 and p99999 are improved, however the p50 percentile is degraded.
21. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.6. Comparison with Heron
21
Ø Colloquium B comment by Professor Keiichi Yasumuto. Proposed approach relation with previous work
Ø P999 Response time for all queries, and p999 aggregate comparison between proposed method and heron.
22. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
5. Summary
Ø In the present work the tail latency problem was reviewed, and the problem of server
selection was considered as a method to reduce tail latency..
Ø Previous work had been based in a simpler queries, thus is no longer suitable for the
complex queries that came to be supported in Cassandra, this served as motivation for
exploring a new approach for server selection using a regression model to model the
interaction between the queries.
Ø This new approach proved to be successful in reducing tail latency, while preserving the
throughput, however it affected negatively the lower percentiles..
22
23. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
6. Future Work
Ø A still remaining point to explore is the use of more evolved machine learning
models, to see if the excessive overhead assumption holds true or not.
Ø And also, experiment with an even greater number of servers
23
25. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
Models Computational Performance
Ø Prediction Latency
l Sklearn benchmark for different models prediction latency
Ø Prediction Throughput
l Sklearn benchmark for different models prediction throughput
https://scikit-learn.org/0.16/modules/computational_performance.html
25