Contenu connexe Similaire à Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks (20) Plus de Amazon Web Services (20) Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Michael Labib, NoSQL Specialist SA
12/6/2017
Scaling Redis Workloads with
Amazon ElastiCache
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to expect from this session
• Amazon ElastiCache Overview
• Scaling your cluster with Online Re-sharding
• Amazon ElastiCache Security & Encryption
• Common Usage Patterns
• Best Practices
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
In-memory key-value store supporting
• Redis 3.2.10
• Memcached 1.4.34
High-performance
Fully managed; zero admin
Highly available and reliable
Hardened by Amazon
Amazon
ElastiCache
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Powerful
~200 commands + Lua scripting
In-memory data structure server
Utility data structures
Strings, lists, hashes, sets, sorted
sets, bitmaps & HyperLogLogs
Simple
Atomic operations
supports transactions
Ridiculously fast!
<1ms latency for most commands
Highly available
replication
Persistence
Open source
Redis Overview
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SMEMBERS features
REDIS:6379>
Amazon
ElastiCache
1) “Easy to deploy & monitor”
AWS
Config
Amazon
CloudWatch
AWS
CloudTrail
AWS
CloudFormation
AWS
Management
Console
AWS CLI
and SDKs
alarm
REDIS:6379>
hget feature:details “deploy-monitor”
Amazon
SNS
Email
Notification
AWS
Lambda
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SMEMBERS features
REDIS:6379>
REDIS:6379>
hget feature:details “enhancements”
2) “Enhanced Redis Engine”
Optimized Swap Memory
•Mitigate the risk of increased swap usage
during syncs and snapshots
Dynamic write throttling
•Improved output buffer management when
the node’s memory is close to being
exhausted
Smoother failovers
•Clusters recover faster as replicas avoid
flushing their data to do a full re-sync with
the primary
Amazon
ElastiCache
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redis Topologies
Slot 0-5461
Cluster Mode Disabled
Keyspace
Slot 10923-16383
I Primary 0-5 Replica’s
Cluster Mode Enabled
Primary Endpoint
1-15 Primaries / Shards
Slot 0
Slot 5462-10922
Slot 16383
Keyspace
0-5 Replica’s
Configuration Endpoint
Slot 1 …
Vertically Scaled
Horizontally Scaled
Max Storage 407 GiB
Max Storage 6+ TiB
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Feature Enabled Disabled
Failover 15–30 sec
(Non-DNS)
~1.5 min
(DNS-based)
Failover risk • Writes affected—partial dataset (less risk with
more partitions)
• Reads available
• Writes affected on entire dataset
• Reads available
Performance Scales with cluster size
(90 nodes—15 primaries + 0–5 replicas per shard)
6 nodes (1 primary + 0–5 replicas)
Max connections • Primaries (65,000 x 15 = 975,000)
• Replicas (65,000 x 75 = 4,875,000)
• Primary: 65,000
• Replicas: (65,000 x 5 = 325,000)
Storage 6+ TiB 407 GB
Cost
Example: Assume
needs 175 GB
Smaller nodes but more $$
9 x cache.r3.xlarge ($0.455hr) = $4.095 hr 255.6 GB
Larger nodes less $
1 X cache.r3.8xlarge = $3.640 , 237 GB
Redis Cluster mode enabled vs disabled
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
ElastiCache
Closer look at cluster-mode enabled
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S5
S1
S2
S4 S3
Client
16384 hash slots per cluster
Slot for a key is CRC16(key) mod 16384
Slots are distributed across the cluster into shards
Developers must use a RedisCluster aware client
Clients are redirected to the correct shard
Smart clients store a map
Shard S1 = slots 0–3276
Shard S2 = slots 3277–6553
Shard S3 = slots 6554–9829
Shard S4 = slots 9830–13106
Shard S5 = slots 13107–16383
Redis Cluster: automatic client -side sharding
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454 slots 5455–10909
Redis Cluster
Redis Cluster—architecture
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909
slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363
slots 10910–16363
Redis Cluster—Multi AZ
A cluster consists of 1 to 15 shards
Example: 3 shard cluster,
2 read replicas
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454
Redis Cluster
Redis Cluster—architecture
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909
slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363
Shard
ReplicaReplicaPrimary
Each shard has a primary node
and up to 5 replica nodes
slots 5455–10909
slots 10910–16363
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454 slots 5455–10909
Redis Cluster
Redis Cluster—architecture
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909
slots 5455–10909
Shard
ReplicaReplica Primary
Each shard has a primary node
and up to 5 replica nodes
slots 0–5454 slots 0–5454
slots 10910–16363
slots 10910–16363
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454
Redis Cluster
Redis Cluster—architecture
slots 10910–16363
Availability Zone B Availability Zone C
slots 10910–16363
slots 10910–16363
Shard
Replica PrimaryReplica
Each shard has a primary node
and up to 5 replica nodes
slots 5455–10909 slots 0–5454
slots 5455–10909
slots 0–5454 slots 5455–10909
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454 slots 5455–10909
Redis Cluster
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363 slots 10910–16363
Scenario 1: single primary failure
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454 slots 5455–10909
Redis Cluster
Scenario 1: single primary failure
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363
Mitigation:
1. Automatic failure detection and replica promotion (~15-30s)
2. Repair failed node
slots 10910–16363
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454 slots 5455–10909
Redis Cluster
Scenario 2: majority of primaries fail
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363slots 10910–16363
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone A
slots 0–5454 slots 5455–10909
Redis Cluster
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
Mitigation: Redis enhancements on ElastiCache
• Automatic failure detection and replica promotion
• Repair failed nodes
slots 10910–16363slots 10910–16363
Scenario 2: majority of primaries fail
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
aws elasticache create-snapshot --replication-group-id redisclusterID --snapshot-name snameStep 1
aws elasticache copy-snapshot --source-snapshot-name sname --target-snapshot-name sname
--target-bucket s3ucketname
Step 2
Step 3 aws elasticache create-replication-group --replication-group-id NewRedisClusterID … --snapshot-arns
arn:aws:s3:::bucketname/redisbackup-0001.rdb, etc.
Step 4 Once the new cluster is up, update your app with new Amazon ElastiCache endpoint, then terminate old cluster.
3 Shards 5 Shards
Downtime
new writes
not in
snapshot
rdb
Pro Tip: DR Strategy – Enable
CRR on S3 bucket triggering
AWS Lambda function to
hydrate destination cluster
Resizing via backup & restore
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Zero-Downtime Online Re-sharding
Amazon
ElastiCache
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0-5461
Shard 1 Shard 2 Shard 3
5462--10922 10923-16383
aws elasticache modify-replication-group-shard-configuration --replication-group-id rep-group-id
--apply-immediately --node-group-count 5
Simple API
Scale In || Out
Online Re-Sharding – Zero Downtime
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0-5461
reads/ writes
Shard 1 Shard 2 Shard 3
Shard 4 Shard 5
5462--10922 10923-163830-2909,
5095-5461
5462-5783,
6876-9830
10923-14199
2910-5094,
9831--10922
No Application Interruption
Uniform slot distribution across shards
5784-6875,
14200-16383
Online Re-Sharding – Zero Downtime- Scale Out
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0-5461
reads/ writes
Shard 1 Shard 2 Shard 3
Shard 4 Shard 5
5462--10922 10923-16383
Uniform slot distribution across shards
No Application Interruption
Online Re-Sharding – Zero Downtime- Scale In
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lambda
3 Shards
…
var params = {
ApplyImmediately: true,
NodeGroupCount: 5,
ReplicationGroupId: ‘rep-group-id’,
… }
elasticache.modifyReplicationGroupShardConfiguration(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data);
}); …
5 Shards
MEMORY
HIGH!
Amazon
CloudWatch
Cluster Resized
AWS SNS
Online Re-Sharding—CW alarm triggered
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
reads/
writes
reads
AZ1
AZ2
reads
search
reads
search
clients
c a c h e c l u s t e r
r e l a t i o n a l
d a t a
Healthy
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AZ1
AZ2
search
search
clients
c a c h e c l u s t e r
r e l a t i o n a l
d a t a
reads/
writes
reads
reads
reads
Heavy
pressure
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AZ1
AZ2
search
search
c a c h e c l u s t e r
clients
r e l a t i o n a l
d a t a
Healthy –
Auto Scaled
Out
reads/
writes
reads
reads
reads
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
ElastiCache
Security Overview
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone B Availability Zone CAvailability Zone A
REDIS:6379> hget feature:details “ref-arch”
30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”
31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”
security group security group security group
32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”
security group security group security group
Elasticache Redis Cluster
Amazon S3
bucket
REDIS RDB
snapshot
Encryption at
REST
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”
security group security group security group
Elasticache Redis Cluster
security group
Public Subnet
security group
Public Subnet Public Subnet
security group
Encryption In-Transit 3.2.6 Redis AUTH
Amazon S3
bucket
Encryption at
REST
REDIS RDB
snapshot
34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Encryption
• In-Transit: Encrypt all communications between clients and
Redis server as well as between nodes
• At-Rest: Encrypt backups on disk and in Amazon S3
• Fully managed: Setup via API or console, automatic
issuance and renewal
Compliance
• HIPAA Eligibility for ElastiCache for Redis
• Included in AWS Business Associate Addendum
• Redis 3.2.6
Amazon ElastiCache Encryption and Compliance
35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
ElastiCache
Common Usage Patterns
36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Session
management
Database caching APIs
(HTTP responses)
IOT
Streaming data
analytics
(Filtering/aggregation)
Pub/sub
Social media (Sentiment
analysis)
Standalone
database
(Metadata store)
Leaderboards
Usage patterns
37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caching
Clients
Amazon
ElastiCache
Redis
Amazon
DynamoDB
Elastic Load
Balancing
Amazon
EC2
Amazon
RDS
write-through
reads/
writes
DDB streams
mysql.lambda_async
reads/
writes
Amazon
S3
reads/writes
Object data
Unstructured data
Relational data
38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Caching NoSQL
Amazon
EC2 reads/
writes
reads
MongoDB
Cluster
Cassandra
Cluster
Smaller NoSQL DB clusters needed = lower costs
Faster data retrieval = better performance
Elasticsearch
Cluster
Clients
39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
EC2
reads/
writes
Amazon
ElastiCache
Redis
reads
MongoDB
Cluster
DBObject doc = collection.findOne();
Cache serialized DBObject in Redis (good)
Cache rows in Redis hash (faster/more efficient)
Cassandra
Cluster
Amazon
ElastiCache
Redis
Amazon
EC2
reads/
writes
reads
ResultSet rs = session.execute(stmt);
Cache serialized ResultSet in Redis (good)
Cache rows in Redis hash (faster/more efficient)
Smaller NoSQL DB clusters needed = lower costs
Faster data retrieval = better performance
Caching NoSQL databases with Amazon ElastiCache
40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
Kinesis
Analytics
Amazon
Kinesis
Streams
Amazon
Kinesis
Streams
Amazon
ElastiCache
(Redis)
cleansed
stream
Streaming data enrichment/processing
Datasources
raw
stream
Subscribers
AWS Lambda function 1
Continualdatafiltering/
Enrichment
Real-time
pub/sub
AWS Lambda function 2
41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big data architectures using Redis
Amazon Kinesis
DataSources
AWS Lambda
Apache Storm
on EMR
Spark Streaming
on Amazon EMR
Amazon
Kinesis app
Amazon
EC2
AWS IoT
Amazon
ElastiCache
Collect
Store
Process
Amazon
S3
Apache Kafka
AWS
Lambda
Custom
app
Spark on
Amazon
EMR
Analyze
42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rules Engine
Amazon
ElastiCache
Redis
AWS
Lambda
Direct integration
LambdaSNS SQS
S3 KinesisDDB
AWS
IoT devices
AWS
IoT
Sensor store
IoT powered by ElastiCache
43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mobile apps powered by ElastiCache
Amazon API
Gateway
AWS
Lambda
Amazon
ElastiCache
Redis
GEOADD
GEORADIUS
Search points of interest
Update points of interest
https://aws.amazon.com/blogs/database/amazon-elasticache-utilizing-redis-geospatial-capabilities/
Amazon
DynamoDB
DDB streams
Amazon
EC2
44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ad tech powered by ElastiCache
Clients
Advertisers
https://aws.amazon.com/caching/database-caching/
Ad network
Ad slot
Consumer
Ad slot
publishers
Ad placement
(websites/apps)
Amazon
ElastiCache
Redis
<40 ms
Clickstream
(shopping
events)
User visits page Publisher
places ad slot
for auction
Ad network
calls for bidsBidders respond
with bids
Winners bid
ad displayed
45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Chat apps powered by ElastiCache
https://aws.amazon.com/blogs/database/amazon-elasticache-utilizing-redis-geospatial-capabilities/
Clients
Chat apps
Application Load
Balancer
WebSockets
Amazon
ElastiCache
Redis
PubSub Server
persistent
connections
Elastic
Beanstalk
SUBSCRIBE chat_channel:114
PUBLISH chat_channel:114 "Hello all"
>> ["message", "chat_channel:114", "Hello all"]
UNSUBSCRIBE chat_channel:114
46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Very popular for gaming apps that need
uniqueness and ordering
• Easy with Redis sorted sets
ZADD "leaderboard" 1201 "Gollum”
ZADD "leaderboard" 963 "Sauron"
ZADD "leaderboard" 1092 "Bilbo"
ZADD "leaderboard" 1383 "Frodo”
ZREVRANGE "leaderboard" 0 -1
1) "Frodo"
2) "Gollum"
3) "Bilbo"
4) "Sauron”
ZREVRANK "leaderboard" "Sauron"
(integer) 3
Gaming—real-time leaderboards
47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ex: throttling requests to anAPI
uses Redis counters
ELB
Externally
facing API
Reference: http://redis.io/commands/INCR
FUNCTION LIMIT_API_CALL(APIaccesskey)
limit = HGET(APIaccesskey, “limit”)
time = CURRENT_UNIX_TIME()
keyname = APIaccesskey + ":” + time
count = GET(keyname)
IF current != NULL && count > limit THEN
ERROR ”API request limit exceeded"
ELSE
MULTI
INCR(keyname)
EXPIRE(keyname,10)
EXEC
PERFORM_API_CALL()
END
Rate limiting
48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
ElastiCache
Best Practices
49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cluster sizing best practices
• Storage—Clusters should have adequate memory
• Recommended: Memory needed + 25% reserved memory (for Redis) + some room for growth (optional
10%)
• Optimize using eviction policies andTTLs
• Scale up or out when before reaching max-memory usingCloudWatch alarms
• Use memory optimized nodes for cost effectiveness (R4 support )
• Performance—Performance should not be compromised
• Benchmark operations using Redis Benchmark tool
• For more READIOPS—Add replicas
• For moreWRITEIOPS—Add shards (scale out)
• For more network IO—Use network optimized instances and scale out
• Use pipelining for bulk reads/writes
• Consider Big(O) time complexity for data structure commands
• Cluster Isolation (apps sharing key space)—Choose a strategy that works for your workload
• Identify what kind of isolation is needed based on the workload and environment
• Isolation: No Isolation $ | Isolation by Purpose $$ | Full Isolation $$$
50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redis Benchmark tool
Open source utility to benchmark performance
example: src/redis-benchmark -h r3-xlarge-perf.foio87.0001.use1.cache.amazonaws.com -p 6379 -n -150000 -d 100
Syntax:
redis-benchmark -h <host> -p <port> -c 50 -n 1000 -d 500 –q
-c <clients>—Specifies the number of parallel connections (default 50).
-n <requests>—Specifies the number of requests (default 1000000).
-d <size>—Specifies the data size of GET and SET values in bytes.
-t <test1,test2>—Comma-separated list of tests to perform.
-q—Quiet operation, displays only the result.
51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redis max-memory policies
Select a max-memory policy based on your workload needs
• noeviction: return errors when the memory limit was reached and the client is trying to execute commands
that might result in more memory to be used.
• allkeys-lru: evict keys trying to remove the less recently used (LRU) keys first.
• volatile-lru: evict keys trying to remove the less recently used (LRU) keys first, but only among keys that have
an expire set.
• allkeys-random: evict random keys to make space for the new data added.
• volatile-random: evict random keys to make space for the new data added, but only evict keys with an expire
set.
• volatile-ttl: evict only keys with an expire set, and try to evict keys with a shorter time to live (TTL) first.
52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key ElastiCache CloudWatch Metrics
• CPUUtilization
• Memcached – up to 90% ok
• Redis – divide by cores (ex: 90% / 4 = 22.5%)
• SwapUsage low
• CacheMisses / CacheHits Ratio low / stable
• Evictions near zero
• Exception: Russian doll caching
• CurrConnections stable
• Setup alarms with CloudWatch Metrics
53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ElastiCache Modifiable Parameters
• Maxclients: 65000 (unchangeable)
• Use connection pooling
• timeout – Closes a connection after its been idle for a given interval
• tcp-keepalive – Detects dead peers given an interval
• Databases: 16 (Default) for non-clustered mode
• Logical partition
• Reserved-memory: 25% (Default)
• Recommended
50% of maxmemory to use before 2.8.22
25% after 2.8.22 – ElastiCache
• Maxmemory-policy:
• The eviction policy for keys when maximum memory usage is reached
• Possible values: volatile-lru, allkeys-lru, volatile-random, allkeys-random, volatile-ttl,
noeviction
54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Understand the frequency of change of underlying data
• Set appropriate TTLs on keys that match that frequency
• Choose appropriate eviction policies that are aligned with application requirements
• Isolate your cluster by purpose (i.e. cache cluster, queue, standalone database, etc.)
• Maintain cache freshness with write-throughs
• Performance test and size your cluster appropriately
• Monitor Cache HIT/MISS ratio and alarm on poor performance
• Use Failover API to test application resiliency
Caching tips
55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank You!
https://aws.amazon.com/elasticache/ Amazon
ElastiCache