Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

EVCache at Netflix

7 998 vues

Publié le

How Caching in AWS Cloud works

Publié dans : Internet
  • Soyez le premier à commenter

EVCache at Netflix

  1. 1. Caching @ Netflix
  2. 2. ●Caching at Netflix ●What is EVCache? ●Additional Features ●Code & Internals ●Architecture Lessons Agenda
  3. 3. Caching at Netflix
  4. 4. How we view caches Globally available Eventually-consistent Ephemeral storage mechanism Tunable replication As an optimization for online services or As primary storage for bulk computation (recommendations, predictions, etc.)
  5. 5. EVCache Use @ Netflix 70+ distinct EVCache clusters Used by nearly 200 applications Data replicated over 3 AWS regions Over 1 Million replications per second 65+ Billion objects 30+ Million ops/second (1.8 Trillion+ per day) 160+ Terabytes of data stored Clusters from 3 to hundreds of instances 12000+ memcached instances of varying size
  6. 6. Typical Request
  7. 7. What is EVCache?
  8. 8. Ephemeral Volatile memCache (EVCache) Clustered memcached optimized for AWS and tuned for Netflix use cases.
  9. 9. EVCache Server Memcached Prana (Sidecar) Monitoring & Other Processes Eureka Client Application Client Library EVCache Client
  10. 10. Why Optimize for AWS ●Instances disappear ●Zones disappear ●Regions can disappear (Chaos Kong) ●These do happen (and we test all the time) ●Network can be lossy ○Throttling ○Dropped packets ●Customer requests move between regions
  11. 11. How we Optimized for AWS ●Multiple copies of data per region ●Clients are local replica aware ●Writes to all local replicas by the client ●Reads are local and retry on other copies ●Replication across regions with a custom replication system
  12. 12. Reading Zone A Client Application Client Library EVCache Client Zone B Client Application Client Library EVCache Client Zone C Client Application Client Library EVCache Client . . .. . .. . .
  13. 13. Writing Zone A Client Application Client Library EVCache Client . . . Zone B Client Application Client Library EVCache Client . . . Zone C Client Application Client Library EVCache Client . . .
  14. 14. Use Case: Fronting Services Client Application Client Library EVCache Client Service Client S S S S. . . C C C C. . . . . .
  15. 15. Use Case: As the Data Store Offline / Nearline Computation Online Client Application Client Library EVCache Client . . . Online Services Offline Services
  16. 16. Use Case: Transient Data Store Online Client Application Client Library EVCache Client Online Client Application Client Library EVCache Client . . . Online Client Application Client Library EVCache Client
  17. 17. Additional Features
  18. 18. Additional Features ●Global cross-region replication ●Secondary indexing ●Cache warming ●Consistency checking All powered by metadata flowing through Kafka
  19. 19. Cross-Region Replication Why Replicate? Maintain duplicate caches in each region Invalidate stale cache entries in other region’s cache What do we replicate? set delete Where do we replicate? One or more other regions, depending on application requirements
  20. 20. ●Replicate delete and invalidations on set Usually used for caches with persistent store Entry fetched from persistent store on next cache miss (demand-fill) Replicate set Used for offline/nearline computation Commonly no persistent store Duplicate cache in multiple regions Cross-Region Replication
  21. 21. Region BRegion A EVCache Replication Repl Writer Kafka Application Client EVCache Replication Repl Writer 1 set or delete 2 send metadata 3 poll msg 6 set or delete Application Client Kafka Cross-Region Replication 7 read
  22. 22. Cross-Region Replication (ping-pong) Region A Region B App App EVCache Replication 4 replicate 7 get EVCache Replication 2 set 3 send metadata 5 set
  23. 23. Cross-Region Replication Choices for Underlying Message System ● AWS SQS ○ Message queueing service ○ Reliable and fast (but with occasional spikes in latency) ○ No guaranteed ordering of messages ○ Messages are processed at-least-once and removed from queue ○ Forward a message to multiple queues to process multiple times ○ Cost based on messages, bandwidth, etc. ● Apache Kafka ○ Open-source publish-subscribe system ○ Reliable and fast enough ○ Messages are ordered within partition ○ Allows processing of same message by different applications
  24. 24. Secondary Indexing ●Why index? ○memcached does not provide a usable index ○Debugging ○Warmup lost instances ○Data insight ●Indexing provided by ElasticSearch
  25. 25. Cache Warming (Deployments) Zone A Client Application Client Library EVCache Client Cache Warmer . . . . . . Kafka. . .
  26. 26. Code & Internals
  27. 27. Minimal Code Example Create EVCache Object EVCache evCache = new EVCache.Builder() .setAppName(“EVCACHE_TEST”) .setCachePrefix("pre") .setDefaultTTL(900) .build(); Write Data evCache.set(“key”, “value”); Read Data evCache.get(“key”); Delete Data evCache.delete(“key”);
  28. 28. Client-side Hashing Ketama Consistent Hashing algorithm If one server is replaced, few keys are shuffled
  29. 29. Architecture Lessons (from outages)
  30. 30. Failure Scenarios ●Load Spikes on the Service ●Dropped Packets (and virtual NIC limits) ●Write-back Cascading Failure
  31. 31. Load Spikes (Personalized Fallbacks) S Z Cassandra U U L
  32. 32. Load Spikes (Personalized Fallbacks) S Z Cassandra U U L
  33. 33. Dropped Packets Client Application Client Library EVCache Client . . .
  34. 34. Write-back cascading failure A C D B S Cassandra
  35. 35. Write-back cascading failure A C D B S CassandraCassandra
  36. 36. Client failure resilience ●Operations fast fail ○No servers in Eureka ○Connection reset ●Exponential backoff ●Read/Write Queues ○When full, fast fail ●Replication write failure ○Secondary path through SQS as a backup
  37. 37. EVCache Open Source github.com/netflix/evcache
  38. 38. Dependencies Server: ● memcached (cache process) ● Prana (sidecar) ● Servo client (metrics) ● Eureka client (Instance discovery) Client: ● Servo client (metrics) ● Eureka client (instance discovery) External: ● Atlas (metrics ingestion & reporting) ● Eureka service
  39. 39. Questions?
  40. 40. Dropped Packets (EC2 Classic)
  41. 41. Dropped Packets (EC2 VPC)
  42. 42. Consistency Checking Zone A Client Application Client Library EVCache Client . . . Zone B Client Application Client Library EVCache Client . . . Kafka SConsistency Checker
  43. 43. (Netflix) Multi-region Architecture A CB US West 2 A CB US East 1 A CB EU West 1
  44. 44. When to Use Caches ●Predictable response time with varying loads ●Improve throughput ●Reduce server costs ●Store results of idempotent computations ●Fallbacks when service is not responding ●Sharing data across multiple disparate services
  45. 45. Know Your Limits There’s probably a limitation in your infrastructure that you don’t know about CPU & Memory are easy, network is hard Cascading failures

×