Peter Smith, PhD, Principal Software Engineer at ACL talks about Performance Optimization of Cloud Based Applications at TriNimbus' 2017 Canadian Executive Cloud & DevOps summit in Vancouver
2. Overview
• Why Optimize?
• … business reasons you should care…
• Technical Stuff
• … contact me later if you want more detail…
• Recommendations
• … to take back to your company tomorrow…
3. About ACL
• Founded in 1987 – Vancouver headquartered.
• Audit for Fraud Detection
• Data-Driven Governance, Risk Management, Compliance (GRC)
• On-Premise Software
• Windows .NET and Java
• SaaS
• Entirely AWS-based
• Ruby-on-Rails, Node.js,
Golang, Scala.
6. Why Should You Care about Optimization?
Performance: How fast does your system react, even to one user?
Scalability: How many users, or how much data can you handle?
(alternatively, do you have the ability to scale?)
Versus
7. Why Should You Care about Optimization?
• Improve end-user experience – New Users
• Focus on features, less so on performance and scalability.
• Improve end-user experience – Existing Users
• Lots of “load”, might abandon your product if too slow.
• Reduce cloud operating costs
• Handling the same workload using less infrastructure leads to
lower cost.
9. Problem Area 1 - Beware of Latency
Latency: The time required for data to travel from Point A to Point B
Vancouver to Virginia 100ms
Vancouver to Singapore 300ms
Between availability zones 1ms
From web server to database 100μs +
From main memory into CPU 100ns
Key Problem – “Chatty Protocols” (repeat
1000x)
10. Problem Area 1 - Beware of Latency
Example 1: SSL/TLS Negotiation
• TLS is a very “chatty” protocol, compared with non-SSL.
500ms versus 180ms to connect!
• New SSL connections require FIVE round trip messages.
• Solution: Use nearby CloudFront as SSL endpoint.
Example 2: N + 1 SQL Database queries
• Badly written queries => too many queries.
• With large data sets, latency adds up!
• Solution: Rewrite using a single optimized query.
12. Problem Area 2 - Efficiency of Your Code
Modern computers can do:
• Billions of cycles/second
• Millions of RAM accesses/second
• Thousands of disk accesses/second
You will waste them, but you need to know where:
• Understand what’s being stored in RAM.
• Understand what your CPU is doing.
• Understand what your disk or database is busy loading.
14. Problem Area 2 - Efficiency of Your Code
Example: Fetch list of Facebook friends
• From SQL database: 10ms
• From cached copy in memory: 1μs
• 10,000 times faster!
Problems:
• Knowing when to “invalidate” caches is hard!
• Implementing caches is hard!
• Storage is more expensive.
Caching: Storing hard-to-compute results for later reuse
15. Problem Area 2 - Efficiency of Your Code
Language/Framework – Choose carefully, based on needs.
Example: Ruby on Rails
• Awesome for development of interactive sites.
• Easy to learn, develop, and debug.
Example: Scala with Play Framework
• Awesome for high performance and scalable
systems, including analytics.
• Harder to learn, slower to implement code.
16. Problem Area 3 - Architecting for “Scale Out”
Always architect your software to be scalable.
• Although not necessarily scaled.
Some common principles:
1. Think distributed instead of monolithic (scale-out not scale-up)
2. Design software components to be stateless (easier scale-out)
3. Assume multiple databases:
• You might start with one database, but eventually it’ll become a
bottleneck – plan for having many.
18. How to Prioritize?
1. A potential customer is negotiating a deal, but wants a feature
added before they’ll pay.
2. An existing customer is pushing your software to new limits
(more users, more data), but is noticing problems.
3. Your QA team stress-tests your product, making it fail.
TTM
Quality
Performance
Scalability
19. Strategy 1 - Don’t Optimize Prematurely
• Keep performance and scalability in mind,
but don’t over-engineer your software.
• Keep ahead of the problems, but not too far ahead.
• When building complex software systems, the actual bottlenecks
might surprise you.
20. Strategy 2 - Define “Good Enough”
• Have an organization-wide consensus on
performance and scalability expectations.
• Don’t leave it up to personal judgment whether
something is good enough – pass/fail must be obvious.
• For example:
• 95% of requests complete within 2 seconds.
• 99.9% of requests complete within 5 seconds.
• The remaining 0.1% might take longer.
21. Strategy 3 – Use Measurement Tools
• Don’t base problems on “personal feelings”.
• Collect performance data for all users (use New Relic / Data Dog)
• Raise alerts when data goes beyond “good enough” – be proactive
22. Strategy 4 – Get Upper Management on Board
• There’s nothing worse than conflicting messages:
“You need to focus on performance of the product, but that
customer feature is needed by Friday”
• Software Developers are often conflicted,
needing firm and consistent leadership.
• Don’t pull them in multiple directions!
23. Strategy 5 – Identify Technical Champions
It’s easy to say “write performant and scalable code”, but HOW???
• Identify knowledgeable and passionate individual
contributors.
• Make these the “go to” people for advising-on
and reviewing performance-critical code.
• Pay special attention to these people when
they’re concerned about issues.
24. Strategy 6 - Always Test with Realistic Data
Developers often test using small amounts of data, typically on laptops.
• Won’t find performance or scalability issues!
• For example:
• Customers had 1000 network devices, but we
tested with 6. Found an O(n^3) algorithm!
Instead, periodically spin-up servers for scalability testing:
• Use production-sized servers.
• Use production-sized data and workload.
26. Take Away Message
Tomorrow, I hope you think differently about performance and
scalability.
1) Do you believe that performance and scalability are
important to think about?
2) What are your company’s quantifiable expectations
on performance and scalability?
3) Are you doing a good enough job to measure and
identify problems?
4) Are there any cultural issues in your organization
preventing you from reaching your goals?