Publicité

Real-Time Dynamic Data Export Using the Kafka Ecosystem

confluent
confluent
22 Oct 2018
Publicité

Contenu connexe

Présentations pour vous(20)

Publicité

Plus de confluent(20)

Publicité

Real-Time Dynamic Data Export Using the Kafka Ecosystem

  1. Real-Time Dynamic Data Export Using the Kafka Ecosystem
  2. 2 Product Overview
 What were we trying to build? Architecture
 How did we build it? What did we learn building it? Today
  3. About Me • Preston Thompson • Senior Software Engineer • 4 years at Braze • Backend Application Developer • Data Infrastructure • Application Infrastructure
  4. More than 
 1 Billion 
 MAU ON SIX CONTINENTS
  5. 1.5B+ MAU 30B+ EVENTS PER DAY Scale
  6. 8 TOC Product Overview
  7. Currents • Real time data export • Customers can create one or more exports to seven different partner destinations • Data Warehouse - AWS S3, Azure Blob, Google Cloud Storage • Behavioral Analytics - Amplitude, Mixpanel • Customer Data Platform - mParticle, Segment • 30 different event types • Message Engagement Events • Customer Behavior Events • 200+ active integrations • ~1B events exported per day
  8. 1 2 TOC Architecture + Lessons Learned
  9. Events • Ruby applications producing to Kafka using ruby-kafka • All events are actions related to a specific end-user • Push Notification Send • Push Notification Open • Campaign Conversion • Purchase • Custom Event • 30 different events types, one topic each • Events for all Braze customers are mixed together within a partition • Use user ID for key to guarantee balanced partitions
  10. Filter and Transform • Requirements • Events types are configurable per integration • 7 different destinations • REST API • Object storage • Solution • Kafka Streams application • Input topics = event topics • Output topics = integration-specific topics • Configuration file storing integration settings (e.g. which events to send, anonymous users)
  11. Denormalization • Events have lots of IDs to reference other items • Campaigns • Message Variations • Apps • Names can be nice in some of the destinations • Example: Amplitude dashboard when selecting campaign • Transform • New topic with database changes • Global State Store
  12. Connect • One topic per integration • Independent processing • Invalid credentials • Partner downtime • Rate limiting • Difficult because we must limit number of partitions per connector • Rebalance loops • Increase number of hosts • Future - maybe split integrations across many connect clusters • REST API • Automatically restart failed tasks • Manage active connectors - new and recently updated • Scale number of active tasks when needed
  13. Custom Connectors • HTTP Connectors • Try to batch the best we can • Retries are difficult • Retry immediately a few times • If that fails, throw RetriableException • Exponential retry not built in • Object storage connectors • S3, Azure Blob, GCS • Built on top of Confluent S3 Connector using pluggable classes • Inner Avro format • Different credentials per connector
  14. Volume Metrics • Counts the number of events exported • Kafka Streams application • Consumes integration-specific topics • Uses a simple aggregator • Requires state, so a bit more complicated • Interactive Queries
  15. Misato • Manages applications to be in the desired state of the system • Creates a configuration file for Streams • Restarts Streams to pick up changes to that file • Creates new topics for integrations • Creates new connectors to read from those topics • Updates configurations for connectors • Manages topics read by Volume Metrics app
  16. 2 7 TOC Quick Recap
  17. Thanks! • Team at Braze • We’re hiring! • braze.com/careers • Confluent • Arcadia Data
Publicité