SlideShare une entreprise Scribd logo
1  sur  69
Télécharger pour lire hors ligne
How Braze uses the MongoDB Aggregation
Pipeline for Lean, Mean, and Efficient
Scaling
Presenting Today
Zach McCormick
Senior Software Engineer
Braze
@zachmccormick
Quick Intro to Braze
Braze empowers you to humanize your brand –
customer relationships at scale.
Tensof
BillionsofMessages
Sent
Monthly
Global
Customer
Presence
Morethan
1Billion
MAU
ON SIX CONTINENTS
How Does It All Work?
•Push, email, in-app messaging, and more for
our customers
•Integration via an SDK and REST API
•Real-time audience segmentation
•Batch, event-driven, and transactional API
messaging
What does this look like at scale?
•Nearly 11 billion user profiles
•Our customers’ end users
•Over 8 billion Sidekiq jobs per day
•Segmentation, messaging, analytics, data
processing
•Over 6 billion API calls per day
•User activity, transactional messaging
•Over 350k MongoDB IOPS across clusters
•Powered by over 1,200 MongoDB shards, 65
different MongoDB clusters
9
TOC
Frequency Capping
What is it? How does it work at Braze?
The Original Design
How did it originally work? What were the issues?
Redesign using the Aggregation Pipeline
What does the new solution look like? Why is it
better?
Looking at the Results
Did it really improve performance? What’s next?
Today
Jobs with Frequency Capping Enabled
Frequency Capping
Why Use Frequency Capping?
Let’s look at our dashboard…
Where does it fit in the process?
Message
Any piece of content sent to a user, often
highly customized by combining Liquid logic
with user profiles.
Channel
A conduit for messages, such as push, email,
in-app message, news feed card, etc.
Campaign
A single-step message send including
segmentation, messages for various
channels, delivery time options, etc.
Canvas
A multi-step “journey” or “workflow”
including entry conditions, segmentation at
various branches, multiple messages, delays,
etc.
Message Sending Pipeline at Braze
• Lots of steps
• Business logic at every step
• High level of parallelization
Audience
Segmentation
Variant Selection
FREQUENCY
CAPPING
Volume Limiting
Subscription
Preferences
Channel Selection
Send Time
Optimization
Enqueueing
Audience
Segmentation
Variant Selection
FREQUENCY
CAPPING
Volume Limiting
Subscription
Preferences
Channel Selection
Send Time
Optimization
Enqueueing
Render Payloads Send Messages Write Analytics
Audience
Segmentation
The Original Design
User Collection Example
{
_id: 123,
first_name: "Zach",
last_name: "McCormick",
email: "zach.mccormick@braze.com",
custom: {
twitter_handle: "zachmccormick"
favorite_food: "Greek",
loves_coffee: true
},
campaign_summaries: {
"Coffee Addict Promo": {
last_received: Date('2019-06-01T12:00:03Z'),
last_opened_email: Date('2019-06-01T12:03:19Z')
}
}
}
Frequency Capping Algorithm
MongoDB Sidekiq Worker
Eligible Users
MongoDB Query
On “Users”
Frequency Capping Algorithm
MongoDB Sidekiq Worker
Data Transfer
(for each user)
Eligible Users
(in batches)
Remove Ineligible
Campaigns
MongoDB Query
On “Users”
Frequency Capping Algorithm
MongoDB Sidekiq Worker
Data Transfer
Count Campaigns
and Check Rule
For Each Rule
(for each user)
Eligible Users
(in batches)
Remove Ineligible
Campaigns
MongoDB Query
On “Users”
Frequency Capping Algorithm
MongoDB Sidekiq Worker
Data Transfer
Count Campaigns
and Check Rule
For Each Rule
(for each user)
Eligible Users
(in batches)
Non-Frequency
Capped Users
What are some potential problems ?
Frequency Capping Problems
• User profiles can be HUGE
• 16 MB max doc size + batch processing
• Network IO & RAM usage
• Not particularly fast…
Frequency Capping in a flame graph of the Sidekiq job
(mostly spent waiting on queries!)
Frequency Capping Problems
• User profiles can be HUGE
• 16 MB max doc size + batch processing
• Network IO & RAM usage
• Not particularly fast…
Frequency Capping Problems
• User profiles can be HUGE
• 16 MB max doc size + batch processing
• Network IO & RAM usage
• Not particularly fast…
• What about the same campaign sent twice?
• “Last received” timestamps alone aren’t
enough data
campaign_summaries: {
"Coffee Addict Promo": {
last_received: Date('2019-06-
01T12:00:03Z'),
last_opened_email: Date('2019-06-
01T12:03:19Z')
}
}
Maybe we can make it smarter?
Micro-optimizations
• What if we limit what parts of the user
profile document we bring back?
• We have aggregate stats, so we know
when certain campaigns were sent
Optimization Attempt #1
Micro-optimizations
• What if we limit what parts of the user
profile document we bring back?
• We have aggregate stats, so we know
when certain campaigns were sent
• However…
• What if the frequency capping window is
fairly large?
• What if the customer has hundreds of
millions of users?
The solution was good,
but it was not enough…
Redesign using the
Aggregation Pipeline
Aggregations at Braze Today
• Documents representing daily aggregate
statistics per-campaign
• Messages sent, messages opened,
conversion events, etc.
• Aggregation Pipeline queries for graphs,
charts, and reports
What are the goals?
Redesign Goals
• Less network IO
• Expensive!
• Less RAM usage
• For huge campaigns, occasional OOM
errors
OOMs in server logs
Redesign Goals
• Less network IO
• Expensive!
• Less RAM usage
• For huge campaigns, occasional OOM
errors
• Much faster execution
• Micro-optimizations are only going to
go so far
Can we still use only User documents?
User Collection Example
{
_id: 123,
first_name: "Zach",
last_name: "McCormick",
email: "zach.mccormick@braze.com",
custom: {
twitter_handle: "zachmccormick"
favorite_food: "Greek",
loves_coffee: true
},
campaign_summaries: {
"Coffee Addict Promo": {
last_received: Date('2019-06-01T12:00:03Z'),
last_opened_email: Date('2019-06-01T12:03:19Z')
}
}
}
Campaign Summaries use a hash, not an array
What about a new supplementary document?
• We don’t want to store more data on User profiles – already too big in some cases
What about a new supplementary document?
• We don’t want to store more data on User profiles – already too big in some cases
• What if this new collection holds arrays of received campaigns
• We can use $slice to keep the arrays reasonably sized
• We can use the same IDs as User profiles to shard efficiently
What about a new supplementary document?
• We don’t want to store more data on User profiles – already too big in some cases
• What if this new collection holds arrays of received campaigns
• We can use $slice to keep the arrays reasonably sized
• We can use the same IDs as User profiles to shard efficiently
• What would that look like?
UserCampaignInteractionData Collection Example
{
_id: 123,
emails_received: [
{
date: Date(‘2019-06-01T12:00:03Z’),
campaign: “CampaignB”,
dispatch_id: “identifier-for-send”
},
{
date: Date(‘2019-05-29T13:37:00Z’),
campaign: “CampaignA”,
dispatch_id: “identifier-for-send”
},
…
],
…
}
UserCampaignInteractionData Collection Example
{
_id: 123,
emails_received: […],
android_push_received: […],
ios_push_received: […],
webhooks_received: […],
sms_received: […],
…
}
UserCampaignInteractionData Collection Example
{
_id: 123,
emails_received: [
{
date: Date(‘2019-06-01T12:00:03Z’),
campaign: “CampaignB”,
dispatch_id: “identifier-for-send”
},
{
date: Date(‘2019-05-29T13:37:00Z’),
campaign: “CampaignA”,
dispatch_id: “identifier-for-send”
},
…
],
…
}
{
_id: 123,
first_name: "Zach",
last_name: "McCormick",
email: "zach.mccormick@braze.com",
custom: {
twitter_handle: "zachmccormick"
favorite_food: "Greek",
loves_coffee: true
},
campaign_summaries: {
"Coffee Addict Promo": {
last_received: Date(
'2019-06-01T12:00:03Z'),
last_opened_email: Date(
'2019-06-01T12:03:19Z')
}
}}
NEW Frequency Capping Algorithm
1. Match stage
2. First projection using $filter
1. Only look at the relevant time window
2. Don’t include the current dispatch (for
multi-channel sends)
3. Exclude campaigns that don’t count
toward frequency capping
Resulting document:
{
“Zach”: {
“email_86400”: [
{
“dispatch_id”: …,
“date”: …,
“campaign”: …
},
…
],
}
}
NEW Frequency Capping Algorithm
1. Match stage
2. First projection using $filter
1. Only look at the relevant time window
2. Don’t include the current dispatch (for
multi-channel sends)
3. Exclude campaigns that don’t count
toward frequency capping
3. Second projection
1. Only bring back dispatch IDs
Resulting document:
{
“Zach”: {
“email_86400”: [
“campaign-a-dispatch-id”,
“campaign-b-dispatch-id”,
],
}
}
UserCampaignInteractionData Query Example
first_projection[”email_86400"] = {
:$filter => {
:input => ”email_received",
:cond => {
:$and => [
# first make sure the tuple we care about is within rule's time window
{:$gte => [
"$$this.date", Date.new(2019,6,9,12,0,0)
]},
# next make sure we don't include transactional messages
{:$not => :$in => [
"$$this.campaign", ["Txn Message One", "Txn Message Two",]
]}
]
}
}
UserCampaignInteractionData Query Example
second_projection[”email_86400"] = "$emails_received.dispatch_id"
Looking at the Results
Frequency Capping – Network Bandwidth
MongoDB
Frequency Capping Version 1
Sidekiq
Transferring full
user profiles
MongoDB
Frequency Capping Version 2
Sidekiq
Transferring only
Dispatch IDsVS.
Frequency Capping v1 vs. v2 Max Duration
Frequency Capping v1 vs. v2 Median Duration
How did this get deployed?
Deployment Strategies
• All functionality behind a feature flipper
• Easy to turn on/off by customer
Deployment Strategies
• All functionality behind a feature flipper
• Easy to turn on/off by customer
• Lots of excess code
• Feature flipper logic is simple – use class X or class Y
Deployment Strategies
• All functionality behind a feature flipper
• Easy to turn on/off by customer
• Lots of excess code
• Feature flipper logic is simple – use class X or class Y
• Feature flipped on slowly
• Hourly and daily check-ins on Datadog
• Minimize impact if something goes wrong
What’s Next?
Frequency Capping by Tag
first_projection[”email_marketing_86400"] = {
:$filter => {
:input => ”email_received",
:cond => {
:$and => [ …,
# only include campaigns tagged “marketing”
{:$in => [
"$$this.campaign", [”July 4 Promo", ”Memorial Day Sale", …]
]}
]
}
}
What else?
• Set the foundation for future expectations
• Customers are always going to want to send messages
• Faster and faster
• With more detailed segmentation
• With more complex inclusion/exclusion rules
Thank you! We are hiring!
braze.com/careers
MongoDB World 2019: How Braze uses the MongoDB Aggregation Pipeline for Lean, Mean, and Efficient Scaling

Contenu connexe

Tendances

Tendances (20)

Introduction to AWS Step Functions
Introduction to AWS Step FunctionsIntroduction to AWS Step Functions
Introduction to AWS Step Functions
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
 
Webinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDBWebinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDB
 
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
AWS Cloud 환경으로​ DB Migration 전략 수립하기
AWS Cloud 환경으로​ DB Migration 전략 수립하기AWS Cloud 환경으로​ DB Migration 전략 수립하기
AWS Cloud 환경으로​ DB Migration 전략 수립하기
 
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Redshift의 이해와 활용 (김용우) - AWS DB DayAmazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ Netflix
 
Amazon Dynamo DB 활용하기 - 강민석 :: AWS Database Modernization Day 온라인
Amazon Dynamo DB 활용하기 - 강민석 :: AWS Database Modernization Day 온라인Amazon Dynamo DB 활용하기 - 강민석 :: AWS Database Modernization Day 온라인
Amazon Dynamo DB 활용하기 - 강민석 :: AWS Database Modernization Day 온라인
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 
Secure Virtual Private Cloud(VPC)를 활용한 보안성 강화와 비용절감 - 안경진 부장, 포티넷 코리아 :: AWS ...
Secure Virtual Private Cloud(VPC)를 활용한 보안성 강화와 비용절감 - 안경진 부장, 포티넷 코리아 :: AWS ...Secure Virtual Private Cloud(VPC)를 활용한 보안성 강화와 비용절감 - 안경진 부장, 포티넷 코리아 :: AWS ...
Secure Virtual Private Cloud(VPC)를 활용한 보안성 강화와 비용절감 - 안경진 부장, 포티넷 코리아 :: AWS ...
 
AWSKRUG-33번째-세션1.pdf
AWSKRUG-33번째-세션1.pdfAWSKRUG-33번째-세션1.pdf
AWSKRUG-33번째-세션1.pdf
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 

Similaire à MongoDB World 2019: How Braze uses the MongoDB Aggregation Pipeline for Lean, Mean, and Efficient Scaling

Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...
Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...
Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...
NoSQLmatters
 
SplunkLive! San Francisco Dec 2012 - Socialize
SplunkLive! San Francisco Dec 2012 - SocializeSplunkLive! San Francisco Dec 2012 - Socialize
SplunkLive! San Francisco Dec 2012 - Socialize
Splunk
 
Isaac Mosquera, Socialize CTO SplunkLive! presentation
Isaac Mosquera, Socialize CTO SplunkLive! presentationIsaac Mosquera, Socialize CTO SplunkLive! presentation
Isaac Mosquera, Socialize CTO SplunkLive! presentation
getsocialize
 

Similaire à MongoDB World 2019: How Braze uses the MongoDB Aggregation Pipeline for Lean, Mean, and Efficient Scaling (20)

ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
 
Easy Demographics and Campaign Management in a Mobile World Using Amazon Pinp...
Easy Demographics and Campaign Management in a Mobile World Using Amazon Pinp...Easy Demographics and Campaign Management in a Mobile World Using Amazon Pinp...
Easy Demographics and Campaign Management in a Mobile World Using Amazon Pinp...
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
Operations: Cost Optimization - Don't Overspend on Infrastructure
Operations: Cost Optimization - Don't Overspend on Infrastructure Operations: Cost Optimization - Don't Overspend on Infrastructure
Operations: Cost Optimization - Don't Overspend on Infrastructure
 
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
 
Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...
Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...
Ameya Kanitkar – Scaling Real Time Analytics with Storm & HBase - NoSQL matte...
 
What's new in MongoDB 3.6?
What's new in MongoDB 3.6?What's new in MongoDB 3.6?
What's new in MongoDB 3.6?
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
 
2016 State of Predictive Marketing
2016 State of Predictive Marketing2016 State of Predictive Marketing
2016 State of Predictive Marketing
 
SplunkLive! San Francisco Dec 2012 - Socialize
SplunkLive! San Francisco Dec 2012 - SocializeSplunkLive! San Francisco Dec 2012 - Socialize
SplunkLive! San Francisco Dec 2012 - Socialize
 
Isaac Mosquera, Socialize CTO SplunkLive! presentation
Isaac Mosquera, Socialize CTO SplunkLive! presentationIsaac Mosquera, Socialize CTO SplunkLive! presentation
Isaac Mosquera, Socialize CTO SplunkLive! presentation
 
Workshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David RaabWorkshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David Raab
 
MongoDB World 2019: Don't Break the Camel's Back: Running MongoDB as Hard as ...
MongoDB World 2019: Don't Break the Camel's Back: Running MongoDB as Hard as ...MongoDB World 2019: Don't Break the Camel's Back: Running MongoDB as Hard as ...
MongoDB World 2019: Don't Break the Camel's Back: Running MongoDB as Hard as ...
 
Enable Your Marketing Teams to Engage Users with Relevant & Personalized Cont...
Enable Your Marketing Teams to Engage Users with Relevant & Personalized Cont...Enable Your Marketing Teams to Engage Users with Relevant & Personalized Cont...
Enable Your Marketing Teams to Engage Users with Relevant & Personalized Cont...
 
Amazon DynamoDB - Auto Scaling Webinar - v3.pptx
Amazon DynamoDB - Auto Scaling Webinar - v3.pptxAmazon DynamoDB - Auto Scaling Webinar - v3.pptx
Amazon DynamoDB - Auto Scaling Webinar - v3.pptx
 

Plus de MongoDB

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

MongoDB World 2019: How Braze uses the MongoDB Aggregation Pipeline for Lean, Mean, and Efficient Scaling

  • 1. How Braze uses the MongoDB Aggregation Pipeline for Lean, Mean, and Efficient Scaling
  • 2. Presenting Today Zach McCormick Senior Software Engineer Braze @zachmccormick
  • 4. Braze empowers you to humanize your brand – customer relationships at scale. Tensof BillionsofMessages Sent Monthly Global Customer Presence Morethan 1Billion MAU ON SIX CONTINENTS
  • 5.
  • 6.
  • 7. How Does It All Work? •Push, email, in-app messaging, and more for our customers •Integration via an SDK and REST API •Real-time audience segmentation •Batch, event-driven, and transactional API messaging
  • 8. What does this look like at scale? •Nearly 11 billion user profiles •Our customers’ end users •Over 8 billion Sidekiq jobs per day •Segmentation, messaging, analytics, data processing •Over 6 billion API calls per day •User activity, transactional messaging •Over 350k MongoDB IOPS across clusters •Powered by over 1,200 MongoDB shards, 65 different MongoDB clusters
  • 9. 9 TOC Frequency Capping What is it? How does it work at Braze? The Original Design How did it originally work? What were the issues? Redesign using the Aggregation Pipeline What does the new solution look like? Why is it better? Looking at the Results Did it really improve performance? What’s next? Today
  • 10. Jobs with Frequency Capping Enabled
  • 12. Why Use Frequency Capping?
  • 13. Let’s look at our dashboard…
  • 14.
  • 15. Where does it fit in the process?
  • 16. Message Any piece of content sent to a user, often highly customized by combining Liquid logic with user profiles.
  • 17. Channel A conduit for messages, such as push, email, in-app message, news feed card, etc.
  • 18. Campaign A single-step message send including segmentation, messages for various channels, delivery time options, etc.
  • 19. Canvas A multi-step “journey” or “workflow” including entry conditions, segmentation at various branches, multiple messages, delays, etc.
  • 20. Message Sending Pipeline at Braze • Lots of steps • Business logic at every step • High level of parallelization
  • 22. Variant Selection FREQUENCY CAPPING Volume Limiting Subscription Preferences Channel Selection Send Time Optimization Enqueueing Audience Segmentation
  • 23. Variant Selection FREQUENCY CAPPING Volume Limiting Subscription Preferences Channel Selection Send Time Optimization Enqueueing Render Payloads Send Messages Write Analytics Audience Segmentation
  • 24.
  • 25.
  • 27. User Collection Example { _id: 123, first_name: "Zach", last_name: "McCormick", email: "zach.mccormick@braze.com", custom: { twitter_handle: "zachmccormick" favorite_food: "Greek", loves_coffee: true }, campaign_summaries: { "Coffee Addict Promo": { last_received: Date('2019-06-01T12:00:03Z'), last_opened_email: Date('2019-06-01T12:03:19Z') } } }
  • 28. Frequency Capping Algorithm MongoDB Sidekiq Worker Eligible Users
  • 29. MongoDB Query On “Users” Frequency Capping Algorithm MongoDB Sidekiq Worker Data Transfer (for each user) Eligible Users (in batches)
  • 30. Remove Ineligible Campaigns MongoDB Query On “Users” Frequency Capping Algorithm MongoDB Sidekiq Worker Data Transfer Count Campaigns and Check Rule For Each Rule (for each user) Eligible Users (in batches)
  • 31. Remove Ineligible Campaigns MongoDB Query On “Users” Frequency Capping Algorithm MongoDB Sidekiq Worker Data Transfer Count Campaigns and Check Rule For Each Rule (for each user) Eligible Users (in batches) Non-Frequency Capped Users
  • 32. What are some potential problems ?
  • 33. Frequency Capping Problems • User profiles can be HUGE • 16 MB max doc size + batch processing • Network IO & RAM usage • Not particularly fast… Frequency Capping in a flame graph of the Sidekiq job (mostly spent waiting on queries!)
  • 34. Frequency Capping Problems • User profiles can be HUGE • 16 MB max doc size + batch processing • Network IO & RAM usage • Not particularly fast…
  • 35. Frequency Capping Problems • User profiles can be HUGE • 16 MB max doc size + batch processing • Network IO & RAM usage • Not particularly fast… • What about the same campaign sent twice? • “Last received” timestamps alone aren’t enough data campaign_summaries: { "Coffee Addict Promo": { last_received: Date('2019-06- 01T12:00:03Z'), last_opened_email: Date('2019-06- 01T12:03:19Z') } }
  • 36. Maybe we can make it smarter?
  • 37. Micro-optimizations • What if we limit what parts of the user profile document we bring back? • We have aggregate stats, so we know when certain campaigns were sent Optimization Attempt #1
  • 38. Micro-optimizations • What if we limit what parts of the user profile document we bring back? • We have aggregate stats, so we know when certain campaigns were sent • However… • What if the frequency capping window is fairly large? • What if the customer has hundreds of millions of users?
  • 39. The solution was good, but it was not enough…
  • 41. Aggregations at Braze Today • Documents representing daily aggregate statistics per-campaign • Messages sent, messages opened, conversion events, etc. • Aggregation Pipeline queries for graphs, charts, and reports
  • 42. What are the goals?
  • 43. Redesign Goals • Less network IO • Expensive! • Less RAM usage • For huge campaigns, occasional OOM errors OOMs in server logs
  • 44. Redesign Goals • Less network IO • Expensive! • Less RAM usage • For huge campaigns, occasional OOM errors • Much faster execution • Micro-optimizations are only going to go so far
  • 45. Can we still use only User documents?
  • 46. User Collection Example { _id: 123, first_name: "Zach", last_name: "McCormick", email: "zach.mccormick@braze.com", custom: { twitter_handle: "zachmccormick" favorite_food: "Greek", loves_coffee: true }, campaign_summaries: { "Coffee Addict Promo": { last_received: Date('2019-06-01T12:00:03Z'), last_opened_email: Date('2019-06-01T12:03:19Z') } } } Campaign Summaries use a hash, not an array
  • 47. What about a new supplementary document? • We don’t want to store more data on User profiles – already too big in some cases
  • 48. What about a new supplementary document? • We don’t want to store more data on User profiles – already too big in some cases • What if this new collection holds arrays of received campaigns • We can use $slice to keep the arrays reasonably sized • We can use the same IDs as User profiles to shard efficiently
  • 49. What about a new supplementary document? • We don’t want to store more data on User profiles – already too big in some cases • What if this new collection holds arrays of received campaigns • We can use $slice to keep the arrays reasonably sized • We can use the same IDs as User profiles to shard efficiently • What would that look like?
  • 50. UserCampaignInteractionData Collection Example { _id: 123, emails_received: [ { date: Date(‘2019-06-01T12:00:03Z’), campaign: “CampaignB”, dispatch_id: “identifier-for-send” }, { date: Date(‘2019-05-29T13:37:00Z’), campaign: “CampaignA”, dispatch_id: “identifier-for-send” }, … ], … }
  • 51. UserCampaignInteractionData Collection Example { _id: 123, emails_received: […], android_push_received: […], ios_push_received: […], webhooks_received: […], sms_received: […], … }
  • 52. UserCampaignInteractionData Collection Example { _id: 123, emails_received: [ { date: Date(‘2019-06-01T12:00:03Z’), campaign: “CampaignB”, dispatch_id: “identifier-for-send” }, { date: Date(‘2019-05-29T13:37:00Z’), campaign: “CampaignA”, dispatch_id: “identifier-for-send” }, … ], … } { _id: 123, first_name: "Zach", last_name: "McCormick", email: "zach.mccormick@braze.com", custom: { twitter_handle: "zachmccormick" favorite_food: "Greek", loves_coffee: true }, campaign_summaries: { "Coffee Addict Promo": { last_received: Date( '2019-06-01T12:00:03Z'), last_opened_email: Date( '2019-06-01T12:03:19Z') } }}
  • 53. NEW Frequency Capping Algorithm 1. Match stage 2. First projection using $filter 1. Only look at the relevant time window 2. Don’t include the current dispatch (for multi-channel sends) 3. Exclude campaigns that don’t count toward frequency capping Resulting document: { “Zach”: { “email_86400”: [ { “dispatch_id”: …, “date”: …, “campaign”: … }, … ], } }
  • 54. NEW Frequency Capping Algorithm 1. Match stage 2. First projection using $filter 1. Only look at the relevant time window 2. Don’t include the current dispatch (for multi-channel sends) 3. Exclude campaigns that don’t count toward frequency capping 3. Second projection 1. Only bring back dispatch IDs Resulting document: { “Zach”: { “email_86400”: [ “campaign-a-dispatch-id”, “campaign-b-dispatch-id”, ], } }
  • 55. UserCampaignInteractionData Query Example first_projection[”email_86400"] = { :$filter => { :input => ”email_received", :cond => { :$and => [ # first make sure the tuple we care about is within rule's time window {:$gte => [ "$$this.date", Date.new(2019,6,9,12,0,0) ]}, # next make sure we don't include transactional messages {:$not => :$in => [ "$$this.campaign", ["Txn Message One", "Txn Message Two",] ]} ] } }
  • 57. Looking at the Results
  • 58. Frequency Capping – Network Bandwidth MongoDB Frequency Capping Version 1 Sidekiq Transferring full user profiles MongoDB Frequency Capping Version 2 Sidekiq Transferring only Dispatch IDsVS.
  • 59. Frequency Capping v1 vs. v2 Max Duration
  • 60. Frequency Capping v1 vs. v2 Median Duration
  • 61. How did this get deployed?
  • 62. Deployment Strategies • All functionality behind a feature flipper • Easy to turn on/off by customer
  • 63. Deployment Strategies • All functionality behind a feature flipper • Easy to turn on/off by customer • Lots of excess code • Feature flipper logic is simple – use class X or class Y
  • 64. Deployment Strategies • All functionality behind a feature flipper • Easy to turn on/off by customer • Lots of excess code • Feature flipper logic is simple – use class X or class Y • Feature flipped on slowly • Hourly and daily check-ins on Datadog • Minimize impact if something goes wrong
  • 66. Frequency Capping by Tag first_projection[”email_marketing_86400"] = { :$filter => { :input => ”email_received", :cond => { :$and => [ …, # only include campaigns tagged “marketing” {:$in => [ "$$this.campaign", [”July 4 Promo", ”Memorial Day Sale", …] ]} ] } }
  • 67. What else? • Set the foundation for future expectations • Customers are always going to want to send messages • Faster and faster • With more detailed segmentation • With more complex inclusion/exclusion rules
  • 68. Thank you! We are hiring! braze.com/careers