Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Why reinvent the wheel at Criteo?

375 vues

Publié le

Growing toward Criteo scale means that sometimes we do need to re-invent the wheel.
We will share with you real life examples of what we have done in our C# stack to achieve that scale.
For instance what happens when traditional load balancing is too costly to scale ?
What about migrating monolithic application to service oriented when you have got spaghetti code ?
You should monitor everything right ? But can you and should you measure and monitor your code down to the task level on your production machines.
Apache Kafka looks good and you want to use it but is there any good enough implementation in C# ?

Publié dans : Ingénierie
  • Soyez le premier à commenter

Why reinvent the wheel at Criteo?

  1. 1. Cedrick MONTOUT 2016 June 8th A story about C# @ Criteo Why do we reinvent the wheel ?
  2. 2. 2 | Copyright © 2016 Criteo • Criteo • Global leader on retargeting • Scalability • One of many groups in the Criteo R&D department • WebScale • We write code to help scale up the real-time Criteo Platform • https://www.linkedin.com/in/kerdrek • https://github.com/kerdrek • @kerdrek Who am I ?
  3. 3. Client Side Load Balancing
  4. 4. 4 | Copyright © 2016 Criteo The situation then … 1/5 Footer: App A Pool App B Pool HA Proxy Service Pool
  5. 5. 5 | Copyright © 2016 Criteo The situation then … 2/5 Footer: App A Pool App B Pool HA Proxy Service Pool Random DC Service Pool Traffic 185k QPS Input size ~5K bytes Output size ~3,5K bytes Network traffic ~12Gbits/s
  6. 6. 6 | Copyright © 2016 Criteo The situation then … 3/5 Footer: App A Pool App B Pool HA Proxy Service Pool Random DC Service Pool Traffic 185k QPS Input size ~5K bytes Output size ~3,5K bytes Network traffic ~12Gbits/s Compression 30% gain
  7. 7. 7 | Copyright © 2016 Criteo The situation then … 4/5 Footer: App A Pool App B Pool HA Proxy Service Pool Random DC Service Pool Traffic 185k QPS Input size ~5K bytes Output size ~3,5K bytes Network traffic ~12Gbits/s Compression 30% gain Bonding 2 ports Combine two physical data links into one logical link, by connecting 2 ports of the switch to 2 network interfaces of the HAProxy
  8. 8. 8 | Copyright © 2016 Criteo The situation then … 5/5 Footer: App A Pool App B Pool HA Proxy Service Pool Random DC Service Pool Traffic 185k QPS Input size ~5K bytes Output size ~3,5K bytes Network traffic ~12Gbits/s Compression 30% gain Bonding 2 ports Combine two physical data links into one logical link, by connecting 2 ports of the switch to 2 network interfaces of the HAProxy 4 pairs
  9. 9. 9 | Copyright © 2016 Criteo The new wheel : Client Side Load Balancing • Bypass HA Proxy • Implemented inside Twitter/Finagle • Reuse existing health Check • Re-implement monitoring App Pool A App Pool B Service Pool
  10. 10. DevHost
  11. 11. 11 | Copyright © 2016 Criteo The situation then … Footer: • Harder and harder to release • No available memory for new feature • Fragmented feature in production • Tightly coupled with the HTTP stack
  12. 12. 12 | Copyright © 2016 Criteo The new wheel: Component • One Input • One Output • On Process
  13. 13. 13 | Copyright © 2016 Criteo The new wheel: A Host with Services • A collection of services
  14. 14. 14 | Copyright © 2016 Criteo The new wheel: DevHost • The DevHost • Several Components • Several Services • Asynchronous Process • Transport agnostic • Not front facing
  15. 15. 15 | Copyright © 2016 Criteo The new wheel: DevHost • Still use in production • ~45% of the windows production machine • Next iteration will use .net core (WiP)
  16. 16. Monitoring @ Task level
  17. 17. 17 | Copyright © 2016 Criteo • Synchronous processing everywhere • Timeout on several pipeline (loosing money) • No clear diagnostic on execution path The situation then … Footer:
  18. 18. 18 | Copyright © 2016 Criteo The new Wheel : Asynchronous Token Framework • TPL was not a solution at that time • Asynchronous Completion Token • Delegate based • Execution is time boxed • Task underneath • Timing for every thing • Metrics available on the machine • Metrics available aggregated on Graphite
  19. 19. Apache Kafka Driver in C#
  20. 20. 20 | Copyright © 2016 Criteo The situation then … Footer: • Syslog • Text based • Fire and Forget • Single messages • No built-in resiliency • No API for consuming
  21. 21. 21 | Copyright © 2016 Criteo The situation then … Footer: • Apache Kafka • Binary • Acknowledged message • Batched messages • Partitioning and replication • Consuming support • Syslog • Text based • Fire and Forget • Single messages • No built-in resiliency • No API for consuming
  22. 22. 22 | Copyright © 2016 Criteo Apache Kafka where is your C# driver ? We looked at several drivers
  23. 23. 23 | Copyright © 2016 Criteo Goldilocks Conundrum all over again We looked at several drivers First driver never used in production
  24. 24. 24 | Copyright © 2016 Criteo Goldilocks Conundrum all over again We looked at several drivers First driver never used in production Second driver was impossible to unit test
  25. 25. 25 | Copyright © 2016 Criteo Goldilocks Conundrum all over again We looked at several drivers First driver never used in production Second driver was impossible to unit test Third driver was not recently maintained
  26. 26. 26 | Copyright © 2016 Criteo The new Wheel: kafka-sharp • Yet another C# driver • Highly tuneable • Written with perf and scale in mind • Battle tested in production • Available here: https://github.com/criteo/kafka-sharp
  27. 27. 27 | Copyright © 2016 Criteo The wheel lists • Distributed load balancer between clients. • Lightweight hosting server. • Low level asynchronous execution framework. • Yet another C# driver for Apache Kafka.
  28. 28. 28 | Copyright © 2016 Criteo
  29. 29. Click to add text

×