1. MODERN DATA PIPELINES WITH
MONGODB AND CONFLUENT
Easily build robust, reactive data pipelines that stream
events between applications and services in real time
Guru Sattanathan | guru@confluent.io
Sam Harley | sam.harley@mongodb.com
2. Agenda Event driven architectures
Kafka and Confluent
Customer challenges
Modern architecture with MongoDB
MongoDB connector for Apache Kafka
Fleet management demo
4. Business Challenges
Difficulty in consuming and reacting in real-time to new fast moving
data sources which affects:
● developer productivity
● ability to react or prevent fraud
● and to stay competitive
5. Solution that enables you to...
● Build sophisticated data-driven and event-driven applications
● Modernize your application architecture
● Uncover new sources of data
● Derive insights that gives your business a competitive advantage
● Choose on prem or cloud deployments
7. Centene - Healthcare Provider
Centene has the largest medicare and medicaid managed
care providers.
Challenge
Centene’s Core Challenge is growth caused by Mergers &
Acquisitions. This caused them to reevaluate their Enterprise
Data Integration and Data Migration Strategies…They wanted
better scalability, availability, faster ETL.
8. Financial Services - a large global bank
Leading global financial services firm, a major provider of various investment
banking, retail banking and financial services.
Challenges
● Need for centralized incident and event management solution
● Modernize legacy applications
● Provide complete health dashboards across various systems
● Streamline payment processing, customer activity tracking and real-time data
integration across the bank
9. IoT / Automotive
Collecting data from cars or field
information is a common use case.
Consider the information you can
get off a car.
Challenges
● Need to centralized telemetry data
● Execute and analyze the data in real-time
● Provide data back for real-time reaction
10. Common Themes
How to:
● Consume and exploit data in real time
● Build fast moving apps enriched with historical context
● Run your business in real-time
A new generation of technologies needed
This is what MongoDB and Confluent offer, together
12. Cloud to Edge, Any Workload
Why MongoDB
Document Model
and MQL:
The fastest way
to innovate
Multi-Cloud, Global
Database:
Freedom & flexibility
MongoDB
Data Platform:
Unified experience for
modern apps
13. The
fastest way
to innovate
Document Model and MQL
Intuitive & Flexible
- Maps to the way developers think & code
- Adapt schema at any time
Powerful: Serve Any Workload
- Comprehensive and expressive MongoDB Query Language
- Strong consistency & ACID transactional guarantees
Universal
- JSON/documents are pervasive in modern application stacks
- Superset of all data models, consistent developer experience
14. The Evolution of MongoDB
Linearizable reads
Intra-cluster compression
Read only views
Log Redaction
Graph Processing
Decimal
Collations
Faceted Navigation
Aggregation ++
ARM, Power, zSeries
BI & Spark Connectors ++
Compass ++
LDAP Authorization
Encrypted Backups
3.4 3.6
Change Streams
Retryable Writes
Expressive Array Updates
Causal Consistency
Consistent Sharded Sec.Reads
Schema Validation
End to End Compression
IP Whitelisting
Default Bind to Localhost
Sessions
WiredTiger 1m+ Collections
Expressive $lookUp
R Driver
Atlas Cross Region Replication
Atlas Auto Storage Scaling
4.0
Replica Set Transactions
Atlas Global Clusters
Atlas HIPAA
Atlas LDAP
Atlas Audit
Atlas Enc. Storage Engine
Atlas Backup Snapshots
Type Conversions
40% Faster Shard Migrations
Snapshot Reads
Non-Blocking Sec. Reads
SHA-2 & TLS 1.1+
Compass Agg Pipeline Builder
Compass Export to Code
Free Monitoring Cloud Service
Ops Manager K8s Beta
MongoDB Stitch GA
Distributed Transactions
Global PIT Reads
Large Transactions
Mutable Shard Key Values
Atlas Data Lake (Beta)
Atlas Auto Scaling (Beta)
Atlas Search (Beta)
Atlas Service Broker & K8s
Field Level Encryption
Multi-CAs
Materialized Views
Wildcard Indexes
Expressive Updates
Apache Kafka Connector
MongoDB Charts GA
Retryable Reads & Writes
New Index Builds
10x Faster stepDown
Storage Node Watchdog
Zstandard Compression
Union
Custom Agg Expressions
Refinable Shard Keys
Compound Hashed Shard Keys
Hedged Reads
Mirrored Reads
Resumable Initial Sync
Time-Based Oplog Retention
Connection Monitoring/Pooling
Streamed Topology Changes
Simultaneous Indexing
Hidden Indexes
Streaming Replication
Global Read/Write Concerns
Rust & Swift Drivers GA
TLS 1.3 & Faster Client Auth
OCSP Stapling
Kerberos Utility
Atlas Online Archive
Auto-Scaling
Schema Recommendations
AWS IAM Auth & Atlas x509
Federated Queries
Ops Manager 4.4
4.44.2
17. Go from this….
Customer Opportunity Contact
Opportunity
Team
Phone Phone
Objects
Tables
Lead
NameNameOpen Activities
ARR Address Contact Roles
SummaryCustomer Detail Activity History
Object Relational Mapping Layer
18. To this: store objects directly…
Customer
Customer
Opportunity
Opportunity
Contact
Contact
Lead
Lead
Objects
Database
19. Intuitive: Contrasting data models
Tabular (Relational) Data Model
Related data split across multiple records and tables
Document Data Model
Related data contained in a single, rich document
{
"_id" : ObjectId("5ad88534e3632e1a35a58d00"),
"name" : {
"first" : "John",
"last" : "Doe" },
"address" : [
{ "location" : "work",
"address" : {
"street" : "16 Hatfields",
"city" : "London",
"postal_code" : "SE1 8DJ"},
"geo" : { "type" : "Point", "coord" : [
51.5065752,-0.109081]}},
+ {...}
],
"dob" : ISODate("1977-04-01T05:00:00Z"),
"retirement_fund" : NumberDecimal("1292815.75")
}
20. Intuitive: Document data model
• Naturally maps to objects in code
– Eliminates requirements to use ORMs
– Breaks down complex interdependencies
between developer and DBAs teams
• Represent data of any structure
– Polymorphic: each document can contain
different fields
– Modify the schema at any time
• Strongly typed for ease of processing
– Over 20 binary encoded JSON data types
• Access by idiomatic drivers in all major
programming language
{
"_id" : ObjectId("5ad88534e3632e1a35a58d00"),
"name" : {
"first" : "John",
"last" : "Doe" },
"address" : [
{ "location" : "work",
"address" : {
"street" : "16 Hatfields",
"city" : "London",
"postal_code" : "SE1 8DJ"},
"geo" : { "type" : "Point", "coord" : [
51.5065752,-0.109081]}},
+ {...}
],
"dob" : ISODate("1977-04-01T05:00:00Z"),
"retirement_fund" : NumberDecimal("1292815.75")
}
22. Flexible Schema: Unlocking developer velocity
Avoids need to update ORM
class mappings and
recompile programming
language classes
Schema changes don’t lock
the database, or cause
performance degradation
while tables are altered
Breaks down complex
inter-group dependencies
and expensive coordination
before new code is
released
23. Data Governance
JSON Schema
Enforces strict schema structure over a complete collection for data governance & quality
• Builds on document validation introduced by restricting new content that can be added to a document
• Enforces presence, type, and values for document content, including nested array
• Simplifies application logic
Tunable: enforce document structure, log warnings, or allow complete schema flexibility
Queryable: identify all existing documents that do not comply
24. Intuitive and fast
Compared to storing data
across multiple tables, a single
document data structure:
• Presents a single place for the
database to read and write data
• Denormalized data eliminates JOINs
for most operational queries
• Simplifies query development and
optimization
_id: 12345678
> name: Object
> address: Array
> phone: Array
email: "john.doe.@mongodb.com"
dob: 1966-07-30 01:00:00:000
˅ interests:Array
0: "Cycling"
1: "IoT"
25. Transactional Data Guarantees
_id: 12345678
> name: Object
> address: Array
> phone: Array
email: "john.doe.@mongodb.com"
dob: 1966-07-30 01:00:00:000
˅ interests:Array
0: "Cycling"
1: "IoT"
For many apps,
single document
transactions meet the
majority of needs
Related data modeled in a single, rich document against
which ACID guarantees are applied
26. MongoDB Multi-Document ACID Transactions
Just like relational transactions
• Multi-statement, familiar relational syntax
• Easy to add to any application
• Multiple documents in 1 or many collections and databases,
across replica sets and sharded clusters
ACID guarantees
• Snapshot isolation, all or nothing execution
• No performance impact for non-transactional operations
27. Documents are Universal
JSON Documents are the modern standard in today’s application stacks
Model and Query Data Any Way You Need
Point | Range | Geospatial | Rich Search | Aggregations | JOINs & UNIONs | Graph Traversals
All wrapped in a single API, giving a consistent experience for any workload
JSON
Documents
Tabular Key-Value Text GraphGeospatial File Storage Events
.pdf
.mp3
.mov
28. MongoDB Atlas: Global cloud database
Self-service & elastic
Deploy, modify, and upgrade clusters with
best-in-class operational automation
Scale up, out, or down in a few clicks or API calls
Automated database maintenance
Database and infrastructure resources as code
Global & cloud-agnostic
Available in 70+ regions across Google Cloud,
Azure, & AWS
Global clusters for read/write anywhere
deployments and multi-region fault tolerance
Easy migrations with a consistent experience
across cloud providers
Enterprise-grade
security & SLAs
Network isolation, VPC peering, end-to-end
encryption, and role-based access controls
Encryption key management, LDAP integration,
granular database auditing
ISO 27001 · SOC 2 · PCI-DSS · HIPAA
Guaranteed reliability with SLAs
Comprehensive monitoring
Deep visibility into 100+ metrics with
proactive alerting
Real-time performance tracking and
automated suggestions
APIs to integrate with monitoring tools
Managed backup
Flexible backup policies
Point-in-time data recovery
Consistent snapshots of sharded deployments
Cloud data mobility
Application development services
Simple, serverless functions for backend logic,
service integrations, and APIs
Database access from your frontend secured by
straightforward, field-level access rules
Database and authentication triggers to react to
changes in real time
31. Auto / Transport
Without Event Streaming With Event Streaming
Event Streaming Enables New Outcomes
Call for driver availability
No knowledge of driver arrival
No data on feature usage
Real-time driver-rider match
Real-time ETA
Real-time sensor diagnostics
Banking Nightly updated account balance
Batch fraud checks
Batch regulatory reporting
Real-time account updates
Real-time credit card fraud alerts
Real-time regulatory reporting
Retail Post-order “out of stock” emails
No upsell through personalization
Batch point-of sale reports
Real-time inventory
Real-time recommendations
Real-time sales reporting
36. What is Apache Kafka?
• A modern, distributed platform for data streams
• Decouples data producers and consumers
• Standardized and flexible data communication in heterogeneous
environments
• Serves a wide range of use cases ranging from:
• Messaging (ala RabbitMQ or ActiveMQ)
• Eventing (ala logging)
• ETL
• Stream processing
36
37. What is Apache Kafka?
Cluster
• One or more servers (brokers) that run
Kafka
Topic
• Category/feed name to which messages
are stored and published
Message
• Byte arrays that can store any object in
any format
Producer
• Writes messages to Kafka topic(s)
Consumer
• Reads messages from Kafka topic(s)
37
40. Confluent Platform
Apache Kafka
Unrestricted
Developer Productivity
Non-Java clients | REST Proxy
Connectors | Hub | Schema Registry
ksqlDB(KSQL)
Efficient Operations
at Scale
Control Center
Operator | Ansible
Auto Data Balancer | Tiered Storage
Production-stage
Prerequisites
RBAC | Secrets | Audit logs
Schema Registry | Schema Validation
Multi-Region Clusters | Replicator
Self Managed Software Freedom of Choice Fully Managed Cloud Service
Enterprise Support | Professional Services Committer-driven Expertise Training | Partners
Open Source | Community licensed
41. Apache Kafka Distribution Hardened and
Tested for Enterprise-level Production
41
Steps we take:
• Bundled for easy
script-driven installation
• Ansible Playbooks
• k8s Operator
Pre-built packages:
• RPM
• Deb
• Tar.gz
• Docker Images
Extensive testing:
• Regressions
• Cluster performance
• Stress tests
• Broker death
• Upgrade tests
• Compatibility tests
42. Confluent Completes Apache Kafka
Rich Pre-built Ecosystem
Instantly Connect
Popular Data Sources
and Sinks
Connecting existing data systems to Kafka in a repeatable way Kafka Connect, Java client
Minimizing the time and effort spent to connect existing data systems
to Kafka
100+ pre-built connectors,
Confluent Hub, MQTT Proxy
Minimizing risk when scaling the platform to more data sources and
sinks
100+ fully supported Confluent and
Partner connectors
Enable Application
Development
Compatibility
Helping developers adhere to standard schemas across Kafka
applications in a simple, centralized and scalable way
Schema Registry
Deploy Confidently
in Production
Eliminating risk at scale by ensuring data quality and compatibility in a
programmatic way
Schema Validation
Simplifying management of the rich Kafka ecosystem as the platform
scales to multiple clusters and teams
Control Center integration with
Connect and Schema Registry
Apache Kafka Open Source Community Commercial
43. Easily Build Event Streaming Applications
Use one, lightweight
SQL syntax to build a
complete real-time
application
Enrich Kafka data with
a robust stream
processing framework
CREATE STREAM payments(user VARCHAR,
payment_amount INT)
WITH (kafka_topic = ’all_payments’,
key = ’user’,
value_format = ’avro’);
USER Payment
Jay $10
Sue $15
Fred $5
... ...
Create aggregations
of event data that can
serve queries to
applications
USER Credit Score
Jay 660
Sue 710
Fred 595
USER Credit Score
Jay 660
Sue 710
Fred 595
USER Credit Score
Jay 660
Sue 710
Fred 595
44. Easily Build
Event Streaming Applications
• View a summary of all clusters
• Develop and run queries
• Support multiple KSQL
clusters at a time
44
46. Simplify your stream processing architecture
ksqlDB provides one solution for capturing events, stream
processing, and serving both push and pull queries
CONNECTORS
STREAM PROCESSING
STATE STORES
1 2
47. Industry’s Only Multicloud Solution for Kafka
47
Private Cloud
Deploy on premises with Confluent
Platform
Deploy on Kubernetes with Operator
Public Cloud / Multi- Cloud
Run self-managed with Confluent
Platform
Leverage a fully managed service with
Confluent Cloud
Hybrid Cloud
Deploy a consistent platform across
on-prem and cloud
Build a persistent bridge between
datacenter and cloud
48. Deploy on Any k8s Platform, On-Prem or Cloud
48
Enterprise distributions Cloud services
Kubernetes Engine
Build-your-
Own Kubernetes
Elastic Kubernetes
Service
Kubernetes
Service
50. MongoDB Kafka Connector
MongoDB Database
MongoDB
Connector
topicA
topicB
topicC
Kafka Cluster
Writes documents
to DB collection
Receives events from
Kafka Topic(s)
MongoDB Database
MongoDB
Connector
topicA
topicB
topicC
Kafka Cluster
Change
Streams
Receives documents
from DB collection
Writes events to
Kafka Topics(s)
SINK:
SOURCE:
51. ● Enables users to easily integrate MongoDB with Kafka
● Users can configure MongoDB as a source to publish data changes from
MongoDB into Kafka topics for streaming to consuming applications
● Users can configure MongoDB as a sink to easily persist events from
Kafka topics directly to MongoDB collections
● Available from Confluent Hub and Confluent Verified Gold
● Certified against Apache Kafka 2.3 and Confluent Platform 5.3 (or later)
MongoDB Connector for Apache Kafka
53. ● This demo will showcase how to build a simple fleet management solution using
○ Confluent Cloud
○ Fully managed ksqlDB
○ Fully managed MongoDB connectors (Preview)
○ MongoDB Atlas
Fleet Management Demo
54. Confluent Cloud will be used to:
○ Acquire telemetry data from a
variety of fleets in real time
○ Process and take action on
real-time events (e.g., trigger a
hazard event if a truck driver
applies harsh braking more
than three times within a
five-minute window)
○ Co-relate/join multiple events
while fleets are on the move
(e.g., determine delivery ETA by
joining the fleets’ GPS data)
Fleet Management Demo
MongoDB Atlas will be used to:
○ Store events and location data
for historical analysis
○ Manage the end-to-end
lifecycle of drivers and fleets
(driver profiles, fleet
specification, registration
details, contact details, etc.)
○ Serve user interfaces to
capture changes, build
monitoring dashboards, etc.
55. MongoDB Atlas
MongoDB Enterprise
MongoDB
Connector
for Kafka
MongoDB
Connector
for Kafka
Confluent and MongoDB Architecture
Data stream Event-driven
data-fabric
Real-time stream processing
and transformations
Managed global
database
Web
IoT
Mobile
- Analytics
- Visualizations
- Charts
- BI
- Spark
Data
consumers
Confluent Cloud
Analyze
(Data warehouse)
Users
Mobile
Kafka Streams
& ksqlDB
Confluent Platform
Legacy Data
Stores
On Premises or any cloud