Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Apache Storm In Retail Context

1 329 vues

Publié le

Catalog data processing using Kafka, Storm and Microservices.

Publié dans : Technologie

Apache Storm In Retail Context

  1. 1. CONFIDENTIAL – Do Not DistributeRetail Core Technology Storm in Retail Context Catalog data processing using Kafka, Storm & Micro-services Karthik Deivasigamani @WalmartLabs
  2. 2. 2CONFIDENTIAL – Do Not DistributeRetail Core Technology Retail Brick & Mortar
  3. 3. 3CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog • Normalization • Taxonomy • Product Matching • Shelving • Attributes • Grouping
  4. 4. 4CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Normalization • Attribute Normalization • clothing_size, clothing_size_type,shoe_size, rug_size, shirt_size,baby_clothing_size, ring_size, bed_size, pet_size, pant_size, sock_size, eyewear_frame_size, serving_size, table_size, waist_size…. => size • Value Normalization • e.l.f. cosmetics, e.l.f. Cosmetics, e.l.f, elf cosmetics, E.L.F. cosmetics, ELF Cosmetics => elf Cosmetics
  5. 5. 5CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Taxonomy Classification => Product Type Category => Shelves
  6. 6. 6CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Attributes ProductTitle Description Brand Color Manufacturer Model Number Dimensions
  7. 7. 7CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Product Matching • UPC, GTIN, PLU, ISBN • Algorithms
  8. 8. 8CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Grouping Variants Bundles
  9. 9. 9CONFIDENTIAL – Do Not DistributeRetail Core Technology Sources for catalog • Market place Seller • Content Providers • Suppliers • Merchants • Legacy Catalogs Product Catalog
  10. 10. 10CONFIDENTIAL – Do Not DistributeRetail Core Technology Characteristics of ingestion pipeline • Zero message loss • Fault Tolerance • Source based Priority Queue • Scale to millions of product updates in an hour. • Product updates in NRT • Checkpoint at various stages
  11. 11. 11CONFIDENTIAL – Do Not DistributeRetail Core Technology Processing source data
  12. 12. 12CONFIDENTIAL – Do Not DistributeRetail Core Technology Processing source data • Choice of language • Teams operate independently • Platform with pluggable services Bolt
  13. 13. 13CONFIDENTIAL – Do Not DistributeRetail Core Technology Source Pipeline Kafka Spout Validate Persist Normalization Classification Attribute Extraction Matching Source Variant Grouping Validate Persist Publish
  14. 14. 14CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Pipeline Kafka Spout Validate Merge Shelve Attribute Extraction Product Variant Grouping Validate Persist Publish
  15. 15. 15CONFIDENTIAL – Do Not DistributeRetail Core Technology Micro batched Grouping Pipeline Kafka Spout Router Bolt Product Group Emitter Bolt Validate Persist Publish Micro- Batching Bolt Kafka Payload Sample: { “variant_product_id” : “1234”, “product_group_id” : “ABC” } Field Grouping
  16. 16. 16CONFIDENTIAL – Do Not DistributeRetail Core Technology Back Pressure • Message loss • Spout stops emitting Knobs • Spout parallelism • kafka message fetch size • max.spout.pending = max number of tuples that can be unacked at any given time • Worker parallelism • Bolt parallelism
  17. 17. 17CONFIDENTIAL – Do Not DistributeRetail Core Technology Failures • Data Errors • Services Timeout • Service outage • Fatal Errors • Validations at various stages • Async IO using RxJava, Hystrix, Retries • Hystrix Circuit Breaker • Failing Tuples
  18. 18. 18CONFIDENTIAL – Do Not DistributeRetail Core Technology Characteristics of ingestion pipeline • Zero message loss – Anchoring and Failing Tuple, maxOffsetBehind = Long.MAX_VALUE • Product updates in NRT • Priority Queue – Partition based and topic based • Scale to millions of product updates in an hour. • Fault Tolerance – Worker failures, Node failures are handled by storm – Nimbus and Supervisors are stateless, fail-fast • Checkpoint at various stages
  19. 19. 19CONFIDENTIAL – Do Not DistributeRetail Core Technology What we monitor • Kafka Lag • Bolt Capacity • JVM – heap, threads • Service SLA • Acked and Failed Tuples • Data Errors and System Errors • OS Metrics
  20. 20. 20CONFIDENTIAL – Do Not DistributeRetail Core Technology Tools For Monitoring • Kafkamon – Monitor lag in the pipeline • Guano – Dump and restore ZK state • Storm UI • Elastic & Kibana – Async logging using log4j2, scribe • Grafana to monitor service latency • Druid for tracking and analytics • FIT – Fault Injection Tool
  21. 21. 21CONFIDENTIAL – Do Not DistributeRetail Core Technology Storm Cluster – Product Catalog 2 Nimbus 7 Supervisor 320 Cores 2TB Memory 35 Slots 14 Topologies 150M Kafka Messages 6481 Executors 360M Network IO Microservice
  22. 22. 22CONFIDENTIAL – Do Not DistributeRetail Core Technology Storm Cluster – Audit / Tracking 1 Nimbus 5 Supervisor 160 Cores 1TB Memory 155 Slots 94 Topologies 1B+ Kafka Messages 1396 Executors
  23. 23. 23CONFIDENTIAL – Do Not DistributeRetail Core Technology Holiday Season • Few thousands sellers • 100M+ seller SKU • 6x traffic • Upgraded to 1.0.2 – HA Nimbus, Improved performance, Improved backpressure handling • Change detection • Improved our monitoring, periodic fault injection • Fast track / Priority Queue for top items How we prepared
  24. 24. 24CONFIDENTIAL – Do Not DistributeRetail Core Technology Lessons learnt • Things will fail • Monitor everything • Automation • Scale is not a feature • Storm works well with large payloads • Logs don’t lie • Micro services come at a cost
  25. 25. 25CONFIDENTIAL – Do Not DistributeRetail Core Technology Path ahead • Stateful stream processing • Storm 1.1.0 – Streaming SQL – Druid integration – PMML(Predictive Model Markup Language) Support
  26. 26. 26CONFIDENTIAL – Do Not DistributeRetail Core Technology Team Yes, we are hiring! http://www.walmartlabs.com/jobs/

×