Performance testing in scope of migration to cloud by Serghei Radov

Performance Testing in scope of migration to Cloud
Serghei Radov

Serghei Radov
Current position:
Senior Performance Engineer at Lohika
Contacts : sergey.radov@gmail.com
Github: github.com/grinslife
Skype : serghei.radov

AGENDA
● Cloud computing principles
● Challenges
● Performance testing as part migration process
● What toolset could be used ?
● How to avoid common pitfalls ?
● Does the "90 percentile" really work?
● What will be the cost of performance testing
toolset?

● Multi-tenancy
● Statistical multiplexing
● Horizontal scalability
● Data partitioning
● Consistent hashing
● Eventual consistency
Cloud computing principles

Cloud performance challenges
● Over provisioning
● Under provisioning
● ELB network traffic issues
● Availability and Reliability

Solution for effective provisioning
Predictive auto-scaling
Scale up early,
Scale down slowly
Use time as a proxy
Machine learning

Netflix’s Predictive Scaling Engine

Predictive Auto Scaling Engines tools
Scryer
Elastisys
AppDynamics
VMTurbo
Rancher

Multi cloud or hybrid cloud
Multiple Availability Zones
Zones independence
Deploy at multiple regions
Employ solid backup and
Some tips

➢ Define acceptance criteria
➢ Select tools for monitoring and testing
➢ Discuss capacity planning responsibilities
➢ Workload Characterization
➢ Test tools for testing
➢ Run tests, analyze, scale, re-run <- cycle

Define performance tests SLA
Statefulness
Response time
Time-out
Exceptions that can be included in the SLA:
Failure
Network issues
Denial of service
Scheduled maintenance

NRQL - NewRelic query language
SELECT uniqueCount(session) FROM PageView SINCE 1 week ago
SELECT uniqueCount(session) FROM PageView SINCE 1 week ago
COMPARE WITH 1 week ago
SELECT count(*) FROM PageView SINCE 1 day ago COMPARE
WITH 1 day ago TIMESERIES AUTO
SELECT uniqueCount(uuid) FROM MobileSession FACET osVersion
SINCE 7 days ago

Additional response times metrics
All these response times are presented as part of App response time.
- Database response times
- Memcached response time
- WebExternal
- Ruby
- GC calls
New Relic provides advanced ability to trace response times across systems using
NRQL.

Additional response times metrics

Transactions throughput
- DC and Cloud resources are
not compatible due to
differences in hardware
configurations.
- Same transactions count
should correspond to current
production level at DC or
above to be able to serve
current users without latency.

Target PEAK load will be 1.14K RPM
Lowest point will be 430 RPM
Finding peaks
(extracted from New relic for presentation only instead of DataDog)

Scenario per one server
- Ramp up to 430
RPM slowly to 700
RPM in 4 hours
- Run test for 6 hours
- Ramp up to 1.14K
rpm
- Run test for 11
hours

Hardware acceptance level
- App server CPU usage
- should not go above 60% during peak 150% load
- threshold of 80%
- Memory usage (avg 60%, threshold 80%)
- Network usage throughput (should correspond DC levels)
- Auto-scaled groups set to false ( initial criteria )
All these metric values depended on production usage, budget and target VMs

CPU usage per 1 server (DataDog)

Monitoring targets
Response times
Resource utilisation at SUT
Resource utilisation at Test Tool
Exceptions
Workload behaviour

Tracking workload in real-time

Select proper EC2 type for an App
General Purpose
Compute Optimized
Memory Optimized
GPU
Storage Optimized
Dense-storage Instances

Model vCPU Mem (GiB) Storage
Dedicated EBS Bandwidth
(Mbps)
c4.large 2 3.75 EBS-Only 500
c4.xlarge 4 7.5 EBS-Only 750
c4.2xlarge 8 15 EBS-Only 1,000
Select proper EC2 type for an App

Workload Characterization
- Catch traffic patterns
- Resource utilisation
- Distribution of response times
- Distribution of response sizes
- Characterizations of users
behaviour
- Analyse input data
- Use performance analysis toolkit

Traffic patterns
“Keep workload as real as possible.”

Characterize user behaviour
Investigate user actions by help of
- New Relic Browser (session+funnel functions)
- Universal Analytics with User behaviour path
- Mixpanel.com (needs code injection)
- Server’s logs at NGINX
- (http requests, REST calls)
- Sumo-logic (apache access logs)
- Server’s App logs (HP ALM has QC sense)

Write analytical tools that will
Parse access / ELB logs
Unite into scripts by timestamp and IP
Reduce amount of unique scripts
Restore high level user actions
Workload distribution
Hard Way

Jmeter
Gatling
Locust
Grinder
Tsung
Open Source load tools - 54 found

● BlazeMeter - (JMeter)
● Visual Studio Team Services - (JMeter)
● Flood IO - (Jmeter , Gatling, Ruby DSL)
● Redline 13 - (Jmeter , Gatling, Ruby DSL)
● OctoPerf - (JMeter)
Load Tool as Service Providers

Create a Grid ( Docker containers)

Flood.io Grids ( JM at Docker EC2)

Create a Flood (upload jmx & data)

General Test result
Amazon Approval for Large Tests is needed

Flood.io results split by transactions

Reports
● Goals & achievements (e.g 150% of Daily RPM is reached)
● Side effects are found (DB connections limit reached due to quick
ramp up)
● Exceptions caught during testing (e.g. ELB lost connections)
● Run-time notes and fixes made by DevOps (EC2 change during the
test iterations)
● Observations ( CPU usage was critical resource during RPM
increase)

Pitfalls during performance testing
Pitfall 1 : 90% percentile matches to prod.
Pitfall 2 : Extrapolation on horizontal scale
Pitfall 3 : Use a Small Amount of Hard Coded Data
Pitfall 5 : Run Tests from One Location
Pitfall 4 : Focus on a Single Use Case

Does the "90 percentile" really work ?

What will be the cost of performance testing
toolset?
Cloud Jmeter
Provider
Type Users Monthly Nodes/Hours AWS cost
BlazeMeter pro 3K 499 100 167.50$
Flood.io(shared
nodes) pay as you go 15K+ 499 100 167.50$
SOASTA pay as you go 10K 22500 undefined 0

Performance testing in scope of migration to cloud by Serghei Radov

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à Performance testing in scope of migration to cloud by Serghei Radov

Similaire à Performance testing in scope of migration to cloud by Serghei Radov (20)

Dernier

Dernier (20)

Performance testing in scope of migration to cloud by Serghei Radov