Windows Azure Storage: Overview, Internals, and Best Practices
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
1. Black Friday and Cyber Monday:
Best Practices for Your E-
Commerce Database
Tim Vaillancourt
Sr. Technical Operations Architect
@ Percona
2. Agenda
●Synchronous versus Asynchronous Applications
●Scaling a Synchronous/Latency-sensitive Application
●Scaling an Asynchronous Application
●Efficient Usage of Data at Scale
Secondary/Slave Hosts
Caching
Queuing
●Efficient Usage of Data at Scale
Moving Expensive Work
Caching Techniques
Counters and In-memory Stores
Connection Pooling
3. Agenda
●Scaling Out (Horizontal) Tricks
Pre-Sharding
Kill Switches
Limits and Graphs
●Scaling with Hardware (Vertical Scaling)
●Testing Performance and Capacity
●Knowing Your Application and Questions to Ask at Development
Time
●Questions
4. About Me
●Started at Percona in January 2016
●Experience
Web Publishing
Big-scale LAMP-based Websites
Ecommerce
Large Inventory SaaS
Gaming
DevOps
50-100 Microservices
5-7+ x Massive Launches / Year
Design, launch and maintain apps
5. About Me
DBA at EA DICE
2 x New Titles
5+ x Legacy Titles
Technologies
MySQL
MongoDB
Cassandra
Redis and Memcached
RabbitMQ, Kafka and ActiveMQ
Solr and Elasticsearch
(Sort of) AWS, HDFS, HBase, Postgres, etc…
6. Services
Monolith
One application that does everything
Example: Chrome, MySQL, huge Python app
Microservice
Different purposes, pain points, SLA apps are discreet services
Often easier to scale/troubleshoot
Reduces risk of outage
Example: frontend PHP app, messaging app, encoding app, etc
In Practice
Both can be scaled up and down with the right features
Microservices offer more flexibility
Monolith services bring problems at scale
7. Application Operations
Synchronous
Blocking operation until success or failure
Slower requests
Example: a file uploading app
Asynchronous
Request and response are separated
Fast response time back to user/application
Example: a social media site
Slow Operations
Can cause pileups in a tiered system
8. Applications
Synchronous
Pros: less code, always the right answer
Cons: blocking operations and poorer efficiency
Example: a file uploading app
Latency/Integrity Sensitive
Pros: always the right answer
Cons: less scalability tricks available
Example: a stock trading app that cannot accept “slave lag”
Asynchronous
Pros: light operations and more scalability
Cons: eventual consistency (and sometimes more code)
Example: a social media site
9. Types of Data Designs
Decentralised
Data is duplicated in several places
Pros: lighter to read, decreased locking, easy to shard
Cons: increased storage space, extra duplication effort
Centralised
Data is kept in one (or few) places and referenced
Pros: less storage, one source-of-truth
Cons: locking, inefficiencies, sharding issues
10. Balancing Request Impact
Read-focused Apps
Benefit from
Values pre-computed at write/change-time
Indices and/or few “scans” for data
No/few JOINs/operations to get result
Write-focused Apps
Benefit from
No pre-computing of values (compute at read-time)
No/few indices to update
Insert/Append > Update
Reads: compute read summaries with replicas, add indices to
secondaries only, etc
11. Event Metadata
Example: “UserX has the new top score!”
Without Queue example
Update Top Score in Database(s)
Send Email to Friends
Post to Facebook Page
Update cache
...
With Queue example
Add event to queue ‘topscore’
Apps read queue
Queuing Updates
12. Queuing Updates
Update Buffering
Scenario: there is a high rate of updates to buffer
Queue-based example
App adds to update buffer (queue)
Worker app works from the bottom of buffer
Queue Operational Benefits
Spikes in traffic
Backend downtime
Communication bus
13. Scaling Sync./Latency-Sensitive Apps
Rethink the Flow Using Async
Use lots of database RAM
Shard the database
Reduce impact of request flow
Apache Cassandra
Synchronous
Very write optimized
Percona XtraDB Cluster, NDB
Use memory-based storage
Queue persistence to database
14. Expensive DB Work
Focus on lightweight user-facing operations
Move aggregations/summaries/reporting to background
Use replicas for expensive jobs
Avoid or reduce (maybe cache) “JOINs”
Enable and monitor metrics
MySQL
log_queries_not_using_indexes
MongoDB
Enable operationProfiling
Review metrics and improve!
Percona Monitoring and Management
Efficient Usage of Data at Scale
15. Efficient Usage of Data at Scale
Caching / In-Memory Stores
Alleviates load from database
Very fast lookups
Low connection overhead
MySQL connection buffers: ~1MB+
MongoDB connection buffers: ~1MB
Redis or Memcache connection buffers: 0-limit/infinity**
Server-Side
Hit/Miss Caching
If something is not in the cache: find + add it. TTL expiry
Inline/Preemptive Caching
Update/Delete cache data at change time/preemptively
16. Efficient Usage of Data at Scale
Caching / In-Memory Stores (continued)
Client-Side
Cache client data in the client app/browser/etc
In-memory Stores
Memcached
Redis
Percona Server for MongoDB with Memory Engine :)
Use TTLs to trim data
17. Efficient Usage of Data at Scale
Storing Numerical Counters and Stats
Offload to in-memory stores
Incremented/decremented counters
Aggregations, summaries, counts
Count-style Queries to Counters
Increment counter at request/change time
Read counter value at read-request time
Or, try to use an index
18. Efficient Usage of Data at Scale
Connection Pooling
Removes 3-way TCP “handshake” from request (more w/SSL)
Reduces threading overhead on databases
Proxies on App server localhost/loopback
Reduces 1 x TCP ‘hop’, ie: faster connect time
Can create a LOT of DB connections with many app servers
19. Efficient Usage of Data at Scale
Connection Pooling (continued)
MySQL Proxies
ProxySQL
HAProxy
Maxscale
Others…
MongoDB Proxies
Mongos (sharding) process
Proxy-on-Localhost or direct is fastest
20. Virtualization
Pretends to be a real computer from BIOS up
OS + Software run under a hypervisor layer
Pros
Full hardware-level emulation, eg: CentOS, Redhat, Win 10
Automation of platform (sometimes)
Cons
Emulation overhead
Slow boot-up time
Lots of OSs to update
Virtualization, Containers, etc
21. Virtualization, Containers, etc
Containers (cgroups, jails)
Several can run inside a single operating system and kernel
Offers controls to limit resources like RAM, CPU time, etc
Pros
Low overhead
Container creation is very fast
22. Virtualization, Containers, etc
Mesos, Kubernetes, etc
Make a lot of servers distribute work, containers, etc
Apache Mesos: “Distributed systems kernel”
Agent on every host and manager servers give out work
Kubernetes
23. Virtualization, Containers, etc
Many Processes per Host
Run un-related processes on hosts
Add/remove from load balancers
Not advised for disk-bound or high-bandwidth apps
24. Scaling Out Tricks
Sharding
Techniques
Modulus
Even distribution of keys
Hard to reshape data
Map-based
1-to-1 shard mapping using another table, config, etc
Easy to reshape data
Launch with many shards in advance
1-4 MySQL/MongoDB Instance/host
1 MySQL/MongoDB Instance/host, 4 x databases as shards
1 MySQL/MongoDB Instance/host, small hardware
26. Scaling Out Tricks
Hardware
Have a strategy to add/remove capacity quickly
Cloud Instances
Mesos/Kubernetes
Automation
Use cheap application servers for in-memory stores and apps
Launch with lots of RAM, scale down post-launch
27. Scaling Out Tricks
Elasticity
Ensure there is a way to add/remove hosts, examples:
Load Balancers
Good health-checks are important
Application Configs
File
Database
Zookeeper
28. Scaling Out Tricks
At Launch...
Scale-out
Keep spare servers online, partially configured
Launch with extra database replicas (slave/secondary)
Monitor usage and remove extra hardware post-launch
Monitor and adjust capacity
Scale-up
Launch with lots of RAM
Traffic Control
Launch one region at a time
Launch with rate limits
29. Scaling Out Tricks
Application “Kill switches”
A switch to disable certain app features/functions
Useful when there is:
Too much traffic/scale-up
DDoS
A maintenance
30. Scaling Out Tricks
Limiting Graph Structures
“Friends” / ”Followers” features are often graphs
If Katy Perry or Barack Obama used your “friends” feature…
Limit the size of graphs, or queue events for fan-out updating
31. Scaling Out Tricks
Batching and Parallel Work
Do large queries in parallel
Modern CPUs have many cores (2, 4, 8+)
1 connection = 1 thread = 1 CPU core
Batch inserts/updates
1 x update with 1000 items > 1000 x updates with 1 item
32. Scaling Up Tricks
Test provider turn-around time on hardware upgrading
Test application performance on improved hardware in advance
Scale up only resources needed
33. Databases
General
Monitoring/reviewing slow queries reduces most inefficiencies
More memory will reduce disk requests
SSDs will reduce disk request time
Proper database and kernel tunings will help further
Linux has very inefficient defaults!
Try to use real local-disks, not EBS, NFS, etc
Queries
Don’t try to make MySQL/MongoDB a queue or search engine!
Decentralizing data and pre-computing answers for reads will take
you far
The best query is no query (cache)
34. Testing Performance and Capacity
General
Try to emulate the real user traffic
Add micro-pauses to simulate reality
Cloud-based providers are great for running load generation
Applications
Component testing
Test the max volume of each component on a single host
Test the max volume of each component on many hosts
Calculate host scalability, ie: “+1 host = +80% more traffic”
Feature capacity
Test the impact of each feature if not separate
35. Testing Performance and Capacity
Databases
Replay real user traffic on real backups
Load test tools: Linkbench, Sysbench, TPCC, JMeter, etc
Single feature/query testing
Understand host capacity per feature, eg: “2000 user login
queries/sec per db replica”
Know your slowest query!
36. Development-time Questions
General
What does the app do?
If I break X, what happens?
Are connections to data stores “pooled”?
Replicas
Can the app use replicas (with possible lag)?
Tip: start early, deploy replication from the start
Can we Add/Remove replicas without disruption?
Sharding
Can the app understand shards/partitions?
How is data balanced post-sharding?
Are there cross-shard references?
37. Development-time Questions
Caching
What data can be cached?
Will an change be read immediately?
Can we pre-cache this change?
When should the cache delete an item?
Can we set TTLs on our keys?
How do we add/remove cache servers easily?
38. Knowing Your App
If you see…
The app is write heavy
Remove overhead from immediate write path
Batch writes if possible
The app is read heavy
Reduce scans/operations from the read path (index, etc)
Add as many replicas (slave/secondary) as needed
The app queries for counts often, ie: # of items, friends, etc
Move count-queries to incremented in-memory counters
Or, create an index for the count query
The app uses references or joins often
Consider decentralising the data (with fan-out updates)
39. Themes
Make all features, apps, databases elastic
Request Flow
Make the heavy workload easy / make the light workload hard
Move graph updates to background (queues, async, etc)
Move ‘counts’ to counters
Caching
Cheaper/faster to access than DB
Try to cache before anyone reads data
Queues
Great for replicating events while simplifying update
Great for batching changes
Monitor everything! Try Percona Monitoring and Management!
40. Join us at Percona Live Europe
When: October 3-5, 2016
Where: Amsterdam, Netherlands
The Percona Live Open Source Database Conference is a great event for users of any level
using open source database technologies.
Get briefed on the hottest topics
Learn about building and maintaining high-performing deployments
Listen to technical experts and top industry leaders
Use promo code “WebinarPLAM16” and receive €15 off the current registration price!
Sponsorship opportunities available as well here.