2. Serghei Radov
Current position:
Senior Performance Engineer at Lohika
Contacts : sergey.radov@gmail.com
Github: github.com/grinslife
Skype : serghei.radov
3. AGENDA
● Cloud computing principles
● Challenges
● Performance testing as part migration process
● What toolset could be used ?
● How to avoid common pitfalls ?
● Does the "90 percentile" really work?
● What will be the cost of performance testing
toolset?
17. Multi cloud or hybrid cloud
Multiple Availability Zones
Zones independence
Deploy at multiple regions
Employ solid backup and
Some tips
18. ➢ Define acceptance criteria
➢ Select tools for monitoring and testing
➢ Discuss capacity planning responsibilities
➢ Workload Characterization
➢ Test tools for testing
➢ Run tests, analyze, scale, re-run <- cycle
19. Define performance tests SLA
Statefulness
Response time
Time-out
Exceptions that can be included in the SLA:
Failure
Network issues
Denial of service
Scheduled maintenance
21. NRQL - NewRelic query language
SELECT uniqueCount(session) FROM PageView SINCE 1 week ago
SELECT uniqueCount(session) FROM PageView SINCE 1 week ago
COMPARE WITH 1 week ago
SELECT count(*) FROM PageView SINCE 1 day ago COMPARE
WITH 1 day ago TIMESERIES AUTO
SELECT uniqueCount(uuid) FROM MobileSession FACET osVersion
SINCE 7 days ago
23. Additional response times metrics
All these response times are presented as part of App response time.
- Database response times
- Memcached response time
- WebExternal
- Ruby
- GC calls
New Relic provides advanced ability to trace response times across systems using
NRQL.
25. Transactions throughput
- DC and Cloud resources are
not compatible due to
differences in hardware
configurations.
- Same transactions count
should correspond to current
production level at DC or
above to be able to serve
current users without latency.
26. Target PEAK load will be 1.14K RPM
Lowest point will be 430 RPM
Finding peaks
(extracted from New relic for presentation only instead of DataDog)
27. Scenario per one server
- Ramp up to 430
RPM slowly to 700
RPM in 4 hours
- Run test for 6 hours
- Ramp up to 1.14K
rpm
- Run test for 11
hours
28. Hardware acceptance level
- App server CPU usage
- should not go above 60% during peak 150% load
- threshold of 80%
- Memory usage (avg 60%, threshold 80%)
- Network usage throughput (should correspond DC levels)
- Auto-scaled groups set to false ( initial criteria )
All these metric values depended on production usage, budget and target VMs
37. ➢ Define acceptance criteria
➢ Select tools for monitoring and testing
➢ Discuss capacity planning responsibilities
➢ Workload Characterization
➢ Test tools for testing
➢ Run tests, analyze, scale, re-run <- cycle
38. Select proper EC2 type for an App
General Purpose
Compute Optimized
Memory Optimized
GPU
Storage Optimized
Dense-storage Instances
39. Model vCPU Mem (GiB) Storage
Dedicated EBS Bandwidth
(Mbps)
c4.large 2 3.75 EBS-Only 500
c4.xlarge 4 7.5 EBS-Only 750
c4.2xlarge 8 15 EBS-Only 1,000
c4.4xlarge 16 30 EBS-Only 2,000
c4.8xlarge 36 60 EBS-Only 4,000
Select proper EC2 type for an App
40. ➢ Define acceptance criteria
➢ Select tools for monitoring and testing
➢ Discuss capacity planning responsibilities
➢ Workload Characterization
➢ Test tools for testing
➢ Run tests, analyze, scale, re-run <- cycle
41. Workload Characterization
- Catch traffic patterns
- Resource utilisation
- Distribution of response times
- Distribution of response sizes
- Characterizations of users
behaviour
- Analyse input data
- Use performance analysis toolkit
44. Characterize user behaviour
Investigate user actions by help of
- New Relic Browser (session+funnel functions)
- Universal Analytics with User behaviour path
- Mixpanel.com (needs code injection)
- Server’s logs at NGINX
- (http requests, REST calls)
- Sumo-logic (apache access logs)
- Server’s App logs (HP ALM has QC sense)
45. Write analytical tools that will
Parse access / ELB logs
Unite into scripts by timestamp and IP
Reduce amount of unique scripts
Restore high level user actions
Workload distribution
Hard Way
46. ➢ Define acceptance criteria
➢ Select tools for monitoring and testing
➢ Discuss capacity planning responsibilities
➢ Workload Characterization
➢ Test tools for testing
➢ Run tests, analyze, scale, re-run <- cycle
58. ➢ Define acceptance criteria
➢ Select tools for monitoring and testing
➢ Discuss capacity planning responsibilities
➢ Workload Characterization
➢ Test tools for testing
➢ Run tests, analyze, scale, re-run <- cycle
59. Reports
● Goals & achievements (e.g 150% of Daily RPM is reached)
● Side effects are found (DB connections limit reached due to quick
ramp up)
● Exceptions caught during testing (e.g. ELB lost connections)
● Run-time notes and fixes made by DevOps (EC2 change during the
test iterations)
● Observations ( CPU usage was critical resource during RPM
increase)
60. Pitfalls during performance testing
Pitfall 1 : 90% percentile matches to prod.
Pitfall 2 : Extrapolation on horizontal scale
Pitfall 3 : Use a Small Amount of Hard Coded Data
Pitfall 5 : Run Tests from One Location
Pitfall 4 : Focus on a Single Use Case
65. What will be the cost of performance testing
toolset?
Cloud Jmeter
Provider
Type Users Monthly Nodes/Hours AWS cost
BlazeMeter pro 3K 499 100 167.50$
Flood.io(shared
nodes) pay as you go 15K+ 499 100 167.50$
SOASTA pay as you go 10K 22500 undefined 0