Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Scaling Push Messaging for Millions of Devices @Netflix

479 vues

Publié le

Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2oA2uI5.

Susheel Aroskar talks about Zuul Push, a scalable push notification service that handles millions of "always-on" persistent connections from all the Netflix apps running. He covers the design of the Zuul Push server and reviews the design details of the back-end message routing infrastructure that lets any Netflix microservice push notifications to any connected client. Filmed at qconnewyork.com.

Susheel Aroskar works as a software engineer on the Cloud Gateway team at Netflix, which develops and operates Zuul, an API gateway that fronts all of the Netflix cloud traffic and handles more than 100 billion requests/day. Prior to Zuul, he worked on Netflix CDN's control plane in the cloud, which is responsible for steering more than a third of all North American peak evening internet traffic.

Publié dans : Technologie
  • Soyez le premier à commenter

Scaling Push Messaging for Millions of Devices @Netflix

  1. 1. How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ neflix-push-messaging-scale
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Why do we need push?
  5. 5. How I spend my time in Netflix application...
  6. 6. ● What is push?
  7. 7. ● What is push? ● How you can build it
  8. 8. ● What is push? ● How you can build it ● How you can operate it
  9. 9. ● What is push? ● How you can build it ● How you can operate it ● What can you do with it
  10. 10. Susheel Aroskar Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar
  11. 11. PERSIST UNTIL SOMETHING HAPPENS
  12. 12. PERSIST UNTIL SOMETHING HAPPENS
  13. 13. Zuul Push Architecture
  14. 14. Zuul Push Servers
  15. 15. Zuul Push Servers WebSockets / SSE
  16. 16. Push Registry Zuul Push Servers Register User WebSockets / SSE
  17. 17. Push Registry Zuul Push Servers Register User WebSockets / SSE
  18. 18. Push Library Push Registry Zuul Push Servers Register User WebSockets / SSE
  19. 19. Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE
  20. 20. Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE
  21. 21. Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE
  22. 22. Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User Lookup server WebSockets / SSE
  23. 23. Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE
  24. 24. Handling millions of persistent connections Zuul Push server
  25. 25. C10K challenge
  26. 26. Socket Socket Thread per Connection Thread-1 Thread-2 Read Write Write Read
  27. 27. Socket Socket Thread per Connection Thread-1 Thread-2 Read Write Write Read Async I/O Socket read callback write callback Socket Single Threadread callback write callback
  28. 28. S O C K E T Channel Inbound Handler Channel Inbound Handler Channel Outbound Handler Channel Outbound Handler Channel Pipeline Head Tail Netty
  29. 29. protected void addPushHandlers(ChannelPipeline pl) { pl.addLast(new HttpServerCodec()); pl.addLast(new HttpObjectAggregator()); pl.addLast(getPushAuthHandler()); pl.addLast(new WebSocketServerCompressionHandler()); pl.addLast(new WebSocketServerProtocolHandler()); pl.addLast(getPushRegistrationHandler()); }
  30. 30. Authenticate by Cookies, JWT or any other custom scheme Plug in your custom authentication policy
  31. 31. Tracking clients’ connection Metadata in real-time Push Registry
  32. 32. public class MyRegistration extends PushRegistrationHandler { @Override protected void registerClient( ChannelHandlerContext ctx, PushUserAuth auth, PushConnection conn, PushConnectionRegistry registry) { super.registerClient(ctx, authEvent, conn, registry); ctx.executor().submit(() -> storeInRedis(auth)); } }
  33. 33. Push registry features checklist
  34. 34. ● Low read latency Push registry features checklist
  35. 35. ● Low read latency ● Record expiry Push registry features checklist
  36. 36. ● Low read latency ● Record expiry ● Sharding Push registry features checklist
  37. 37. ● Low read latency ● Record expiry ● Sharding ● Replication Push registry features checklist
  38. 38. What we use https://github.com/Netflix/dynomite Redis + Auto-sharding + Read/Write quorum + Cross-region replication Dynomite
  39. 39. Message Processing Queue, Route Deliver
  40. 40. We use Kafka message queues to decouple message senders from receivers
  41. 41. Fire and Forget
  42. 42. Cross-region Replication
  43. 43. Different queues for different priorities
  44. 44. We run multiple message processor instances in parallel to scale our message processing throughput.
  45. 45. Operating Zuul Push Different than REST of them
  46. 46. Persistent connections make Zuul Push server stateful Long lived stable connections
  47. 47. Persistent connections make Zuul Push server stateful Long lived stable connections ○ Great for client efficiency
  48. 48. Persistent connections make Zuul Push server stateful Long lived stable connections ○ Great for client efficiency ○ Terrible for quick deploy/rollback
  49. 49. If you love your clients set them free... Tear down connections periodically
  50. 50. Randomize each connection’s lifetime
  51. 51. #reconnects Time Effect of randomizing connection lifetime on reconnect peaks
  52. 52. Ask client to close its connection.
  53. 53. Most connections are idle! How to optimize push server
  54. 54. BIG Server, tons of connections ulimit -n 262144 net.ipv4.tcp_rmem="4096 87380 16777216" net.ipv4.tcp_wmem="4096 87380 16777216"
  55. 55. Goldilocks strategy
  56. 56. Optimize for cost, NOT instance count ✓ $$ $$ ❌
  57. 57. How to auto-scale?
  58. 58. How to auto-scale? RPS? CPU??
  59. 59. How to auto-scale? RPS? CPU?? Open Connections
  60. 60. Amazon Elastic Load Balancers cannot proxy WebSockets.
  61. 61. Solution - Run ELB as a TCP load balancer 7 Application 6 Presentation 5 Session 4 Transport 3 Network 2 Data link 1 Physical HTTP TCP IP Ethernet OSI 7 network layers (conceptual) HTTP over TCP/IP Layer 7 HTTP (WebSocket Upgrade Request) Layer 4 TCP
  62. 62. Managing push cluster - a quick recap ● Recycle connections after tens of minutes
  63. 63. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize each connection’s lifetime
  64. 64. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize connection’s lifetime ● More number of smaller servers >> few BIG servers
  65. 65. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize connection’s lifetime ● More number of smaller servers >> few BIG servers ● Auto-scale on number of open connections per box
  66. 66. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize connection’s lifetime ● More number of smaller servers >> few BIG servers ● Auto-scale on number of open connections per box ● WebSocket aware vs TCP load balancer
  67. 67. If you build it, They will push
  68. 68. On-demand diagnostics
  69. 69. Remote recovery
  70. 70. User messaging
  71. 71. WHAT WILL YOU USE IT FOR?
  72. 72. Call to action
  73. 73. PULL!
  74. 74. PULL! https://github.com/Netflix/zuul
  75. 75. In conclusion, push can make you
  76. 76. In conclusion, push can make you rich (in functionality),
  77. 77. In conclusion, push can make you rich (in functionality), thin (by getting rid of polling)
  78. 78. In conclusion, push can make you rich (in functionality), thin (by getting rid of polling) and happy!
  79. 79. Thank you.
  80. 80. Questions? Susheel Aroskar Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar
  81. 81. Rich, exciting Apps More efficient systems Easy to customize Easy to operate Zuul Push Battle tested
  82. 82. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/neflix- push-messaging-scale

×