In this talk, veteran Kafka success expert, Mitch Henderson, will covers the most common mistakes people make when first adopting Apache Kafka. We will start with smaller single application deployments and work our way up to common mistakes made by larger centralized platform deployments. Among other things, we'll discuss: - Rules of thumb for initial infrastructure and deployment considerations - protecting cluster stability from rouge applications hell bent on destruction - Single large cluster or many small? How to decide? - Getting users onboard and doing productive work After this session, you'll be able to avoid the common mistakes, and confidently guide your team toward a better class of mistakes - those that are directly relevant to the value you add to the business
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Learnings from our journey to become an event-driven CDP
1. Learnings from our
journey to become an
event-driven Customer
Data Platform
www.optimove.com |info@optimove.com
Adam Abrams | Kafka Summit 2020
2. TheScience-FirstRelationshipMarketingHub2
Optimove
Driving measurable growth by
orchestrating, measuring and
optimizing personalized marketing, at
scale
Named a Multichannel Marketing
Hubs Challenger in Gartner’s Magic
Quadrant
Named a Cross-Channel Campaign
Management Strong Performer in
Forrester’s Wave
Named a Customer Data Platform
Market Leader by G2Crowd
Adam Abrams
R&D Director of
Event Streaming
and Realtime
Trusted by
500+ brands
3. TheScience-FirstRelationshipMarketingHub3
Optimove in a Martech Ecosystem
DailyData
Feed
Realtime
Events
Data and Segmentation | Smart Orchestration | Analytics & BI | Optimization
Execution
Details
Campaign
Metrics
Assignment
Details
Optimove
DWH
Data
Lake
Business
Unit DB
POS
Commerce
Platform
SQL
Server
Promotion
System
Loyalty
System
Surveys AppWeb
Support
Cloud
POSCall Center SMSDirect Mail
Ad
NetworksEmailIn App
Web
Pop-Up
Push
Promotion
System
5. TheScience-FirstRelationshipMarketingHub5
Getting data in and out of Kafka
Custom Event
Streamer
Optimove API
Connect
Optimove Engager
Connect
BigQuery
Connect
BigTable
Connect
Optimove API
Connect
Optimove UI
Connect
Self-service Realtime
Customer Profile
Advanced Use CasesData Ingress Data Egress
6. TheScience-FirstRelationshipMarketingHub7
Self-service real-time Customer360
Custom Event
Streamer
Optimove API
Connect
Event Aggregations
ksqlDB
Customer360
ksqlDB
Optimove Engager
Connect
BigQuery
Connect
BigTable
Connect
Optimove API
Connect
Optimove UI
Connect
Self-service Realtime
Customer Profile
Advanced Use CasesData Ingress Data Egress
7. TheScience-FirstRelationshipMarketingHub9
Advanced Use Cases
Custom Event
Streamer
Optimove API
Connect
Identity Graph
KStreams
Real-time SOJ
KStreams
Event Aggregations
ksqlDB
Customer360
ksqlDB
Optimove Engager
Connect
BigQuery
Connect
BigTable
Connect
Optimove API
Connect
Optimove UI
Connect
Self-service Realtime
Customer Profile
Advanced Use CasesData Ingress Data Egress
8. TheScience-FirstRelationshipMarketingHub12
Takeaways • Align on compressed AVRO to strengthen inter-
service communications and save costs.
• ksqlDB is a great solution for 80% of use cases and
gets you to results fast.
• For the remaining 20%, KStreams is very powerful.
However, it requires full-on software development.
• Leverage SMTs in Connect and custom event
streaming to prevent data duplication and increased
latency.
• Consider working with deltas and not full snapshots
where it makes sense.
My name is Adam Abrams, R&D Director of Event Streaming & Realtime @ Optimove
I joined Optimove, a relationship Marketing Hub powered with AI, from Axonite (a startup I cofounded which was acquired by Optimove). In Axonite, I was building a real-time customer data platform based on Confluent Cloud
At Optimove, this platform is being used to power Realtime Customer 360, 3rd-party system Data Sync, External Event-Based Triggers, and real-time self-optimising journeys
This talk will focus on three challenges we faced while becoming event-driven and how we solved them – providing concrete, actionable takeaways that you may leverage on your own journey
In order to give a little context to the following slides, I’d like to describe briefly the technology ecosystem in which Optimove operates:
Optimove is a SaaS marketing solution used to discover customer insights and orchestrate marketing campaigns.
It sits between data sources, seen at the bottom of the slide, and execution channels for customer engagement at the top.
Optimove is unique in that it combines batch data and real-time event processing which leads to some of the challenges we’ll discuss now
In the following slides, I’ll go over how we built out solution, focusing on 3 main challenges we faced:
Getting data in and out of Kafka cost-effectively. We serve over 500 brands in a multi-tenant setup, and keeping down the cost of data ingress and egress as well as storage was a challenge
We wanted to give our marketing users at the different brands the ability to dynamically customize the Customer360 real-time profile & support custom events and rule-based logic
We wanted to provide advanced real-time features such as Machine Learning inference, identity resolution graph and Self-optimizing journeys that dynamically build a personalized journey for each customer based on real time events as well as full historic context - And do it with a small team of developers.
The first challenge was getting data in and out of Kafka cost-effectively
Periodically pull in dimension data for hundreds of millions of end-customers across all tenants – the Optimove API provides batch information from multiple sources
Receive tens of thousands of events per second in real-time- Sources for events include the Optimove SDK & Optimove Webhook, among others
Push data out to multiple systems, in different formats and encodings - Such as BigQuery, BigTable and Optimove’s own UI, API and execution channels (SMS, Push notifications, etc.)
How we solved it – While reducing cost
Standardized on compressed AVRO (using Schema Registry) to optimize CPU processing, bandwidth & storage (two different compression algorithms Snappy & Zstandard that present different tradeoffs)
Built a Custom event streamer (authentication & authorization, transforms to AVRO, compresses and directs messages to the correct topics per tenant)
Used low partition count per tenant topic as a baseline, while measuring and adding partitions as required
Used SMTs in Kafka Connect to extract relevant data and keys, important to reduce data duplication inside Kafka (e.g. repartitioning by key to allow joins)
Optimized data size and message count by using deltas instead of snapshots. On ingress extracting events by comparing old and new snapshots, and on egress by sending only changes to 3rd party systems.
Result
90% data streaming and storage cost reduction
With over 500 tenants with separate setups, it was easy to go overboard with costs
Start small and have a plan for specific tenants that need increased capacity
Separate topics to 2 groups:
1. Ones that just require balanced partitioning and aren't used directly in Joins and Aggregations may be increased very simply.
2. Topics where the partitioning scheme an co-partitioning is critical may simply be replaced by new, more capacious topics, with ksqlDB used for one time transfers of data.
Once we had that setup, with data available – We set out to create the real-time Customer360 profile
Having an up to date Customer360 profile, is a corner stone of every customer data platform and enables more advanced use cases discussed later.
We wanted to allow non-technical users to define how batch & streaming processes build the Customer360 profile
Lastly, we wanted to support custom events and rule-based logic for aggregating the events into the profile
How we solved it
A graphical UI allows Lego-like construction of Customer360 attributes via visual business rules
An engine converts this graphical description into a series of ksqlDB queries and updates the queries in the running clusters
Result
A completely self-service Customer360 solution
With over 500 tenants with separate setups, it was easy to go overboard with costs
Start small and have a plan for specific tenants that need increased capacity
Separate topics to 2 groups:
1. Ones that just require balanced partitioning and aren't used directly in Joins and Aggregations may be increased very simply.
2. Topics where the partitioning scheme an co-partitioning is critical may simply be replaced by new, more capacious topics, with ksqlDB used for one time transfers of data.
Once we had our data, as well as the real-time Customer360 profile, we started tackling more advanced use cases such as:
An identity graph for resolving customer identity across devices and preventing duplicates
ML inference for re-evaluating decisions taken earlier, based on recent events received, leveraging the rich feature vector maintained by the Customer360 profile
Self-Optimizing Journey to autonomously determine and serve the next-best-action for each customer
Implementing these use cases presents a challenge as they are not a good fit for the SQL paradigm (supported by ksqlDB)
In addition, specific optimizations and low-level control were necessary
To overcome these challenges we
Used KStreams with focus on Processor API, which allows us maximum flexibility while still allowing developers to work at a high level of abstraction
We again aligned on AVRO communication between services to ease integration with Kafka Connect and ksqlDB
We built “use-case specific” micro-services, to keep code size low, and allow quick iteration on each of the services
And eliminated reliance on external databases using Kstreams state-stores to increase performance, availability & independence of each service
As a result, we now have a group of
Simple, Reliable and efficient micro-services - Glued together by ksqlDB and Connect over Kafka
Now with the full architecture in view, you can see how the pieces fit together. Hopefully, sharing our learnings and decision process behind-the-scenes, helps you solve your own challenges effectively.
With over 500 tenants with separate setups, it was easy to go overboard with costs
Start small and have a plan for specific tenants that need increased capacity
Separate topics to 2 groups:
1. Ones that just require balanced partitioning and aren't used directly in Joins and Aggregations may be increased very simply.
2. Topics where the partitioning scheme an co-partitioning is critical may simply be replaced by new, more capacious topics, with ksqlDB used for one time transfers of data.
With over 500 tenants with separate setups, it was easy to go overboard with costs
Start small and have a plan for specific tenants that need increased capacity
Separate topics to 2 groups:
1. Ones that just require balanced partitioning and aren't used directly in Joins and Aggregations may be increased very simply.
2. Topics where the partitioning scheme an co-partitioning is critical may simply be replaced by new, more capacious topics, with ksqlDB used for one time transfers of data.
Before we open for questions, I’d like to sum up the main takeaways and suggestions:
Align on compressed AVRO (and use Schema Registry) to strengthen inter-service communications and save costs.
ksqlDB is a great solution for the 80% of use cases that match the relational SQL model.
For more advanced use cases , KStreams is very powerful. Be sure to study the Processor API to get the most out if it.
SMTs in Connect and custom event streaming are a good way to prevent data duplication and increased latency inside Kafka (by doing it right the first time).
Lastly, Consider working with deltas/events and not full snapshots as much as possible for saving on code complexity, processing, bandwidth and storage.
I know this was a lot to take in, we will be sharing the full presentation with additional information for your future reference.
Thanks for your time, I hope it was helpful. If there are any questions, I’d gladly answer them now…