SlideShare a Scribd company logo
1 of 56
Download to read offline
Building Event Streaming
Architectures on Scylla and
Confluent with Kafka
Tim Berglund
Senior Director of Developer Advocacy
Presenters
Alexys Jacob
CTO
Maheedhar Gunturu
Director of Technical Alliances
Othmane El Metioui
Chief Data Officer
Agenda
ᐩ Brief Intro to Scylla
ᐩ Scylla + Kafka at Numberly
ᐩ Change Data Capture in Scylla
ᐩ Streaming Data from Scylla to
Kafka
About ScyllaDB
4
• Reimagined the NoSQL database
• Close-to-the-hardware design, written in C++
• Open source, enterprise & DBaaS
• From the creators of KVM hypervisor
Winner Infoworld
Technology
of the Year
5
Grows with your business & your data
– Volume –
Multi-petabyte
– Throughput –
1 billion OPS
– Horizontal Scalability –
1,000-node cluster
– Availability –
1 to 10+ replicas
within a datacenter
– Consistent Latencies –
Low single-digit millisecond p99s
– Vertical Scalability –
1 to 416 vCPUs
– Unlimited –
Cell sizes and
partition width
– Consistency Options –
Eventual consistency
to linearizability
6
Used across industries
AdTech/MarTech
Multimedia Finance/FinTech Security
Ride-hailing/
Food Delivery
Social Retail Travel IoT Logistics/Transportation
Deployment options
Install in Your Datacenter
➔ Scylla Open Source
➔ Scylla Enterprise
➔ AWS Outposts
Deploy at a Cloud Provider
➔ Scylla Open Source
➔ Scylla Enterprise
Database as a Service
➔ Fully managed Scylla
clusters
➔ Bring Your Own Acct
(BYOA) option
On-Prem Cloud Hosted Scylla Cloud
7
Run on Kubernetes
➔ Manage with Scylla
Operator
Kubernetes
8
Scylla + Kafka at
Architectural choices and overview
9
At Numberly, we run bare-metal clusters
Scylla
3 clusters, with multi-datacenter
topology
• Staging
• Production web facing
• Production OLAP+OLTP
• RF=3 per DC
DELL hardware
• RAID0 NVMe
• up to 96 AMD cores per node
• up to 512GB RAM per node
Confluent Kafka
2 clusters, with active-active multi-datacenter
topology
• Staging
• Production
DELL hardware
• 6 brokers
12 TB SSD ( RAID0 )
2x 24 cores
64GB RAM
• 12 other nodes
Connect cluster, Schema Registry,
Zookeepers...
10
Scylla Cloud &
Confluent Cloud
TL;DR: The people behind the technology know better!
Cloud hosted solutions should be considered
depending on your infrastructure maturity and hosting
constraints.
Our experience shows that cloud providers such as
AWS always lag behind versions and provide poor
monitoring & alerting capabilities.
11
Scylla + Kafka at
Stack usage overview
Scylla
• Scylla Manager
• Scylla Monitoring
• Easy data expiration (TTL) on large time
windows (6+ months)
Combining Scylla and Confluent Kafka powers
Confluent Kafka
• Kafka Connect & Exporter
• Schema registry
• KSQL
• Home-made control center interface +
grafana
Started with in-house Kafka streams
and Python pipelines to propagate
data changes between Scylla & Kafka
12
Scylla
• Scylla Manager
• Scylla Monitoring
• Easy data expiration (TTL) on large time
windows (6+ months)
Confluent Kafka
• Kafka Connect & Exporter
• Schema registry
• KSQL
• Home-made control center interface +
grafana
Combining Scylla and Confluent Kafka powers
The Confluent certified CDC
connector will simplify our pipelines!
13
14
Scylla + Kafka at
Scylla is used as a low-latency remote state store
providing easy data expiry capabilities
to Kafka streams and pipelines (in & out)
Use case #1
Data pipeline enrichment
Scylla to the rescue in overcoming a too large
JOIN window for Kafka
15
Use case #1: how we did it before
The
Speaker’s
camera
displays
here
16
Numberly’s
web tracking
RabbitMQ exchange
Scylla 13+ months retention
High throughput writes
+
Low latency reads, expiring data
beanstalkd
Python
programs
write + read
Use case #1: our first attempt
The
Speaker’s
camera
displays
here
17
Numberly’s
web tracking
Kafka streams
Compacted topic
read
Kafka streams
write
Kafka connect
Ktable
redis
Scaling limitations of Kafka JOIN windows
• The retention of our source data enriched from Scylla is long (13+ months)
Data set size average of 150+GB per table, totaling 1.2+TB source data
• Multiple successive JOINs is heavy on Kafka on large datasets
Large state store on RocksDB memory issues caused Kubernetes pod OOM kills
Rebuilding the state store after Kafka streams restart ( pod ) was too long
Standby replicas comes with a cost for large state store
We turned to Scylla to be a remote, highly available, distributed state store!
18
Use case #1: how we do it today
The
Speaker’s
camera
displays
here
19
Numberly’s
web tracking
Kafka streams
Scylla 13+ months retention
High throughput writes
+
Low latency reads, expiring data
read
Kafka streams
write
Use case #1: takeaways
• Metrics
Metrics are important to a successful tuning (query response times, dataset size)
Use prometheus client instead of implementing kafka streams metrics
• Tuning
Size the number of partitions regarding your query metrics
Mind your time to recovery: max throughput capacity should be at least 3x the average
Add Query caching that should cover your average query time, no more to maximize consistency
Make sure you use a shard aware client for Scylla
The
Speaker’s
camera
displays
here
20
Use case #2
Scylla “most innovative use case” award
winning Synapse platform
Real time user segmentation
Kafka to the rescue in overcoming large
partitions
on Scylla for an OLAP statistical workload
21
Use case #2: Synapse platform
The
Speaker’s
camera
displays
here
22
Numberly’s web tracking
Synapse services
Business rules
Partners
calculation
Segmentation store
distribution
configuration
Kafka & Scylla: a complementary match
Where we chose Scylla over native Kafka
● Large number of tables with different sizes
○ Would create 10000+ topics if compact tables were used instead of Scylla
● TTL management on kafka compact table adds custom processing logic and complexity
○ Propagating Scylla expired data events stills adds complexity
○ We crave for expiration events in CDC
(https://github.com/scylladb/scylla/issues/8380)
● Leverage Scylla low latency reads capability to consume or enrich data at scale
Where Kafka saved the day for Scylla
● Compute real time stats on high cardinality data generated large partitions on Scylla
○ A user (partition key) is part of multiple segments (cluster key) = counting OK
○ A segment (partition key) has a great lot of users (cluster key) = large partition =
counting KO
23
Use case #2: takeaways
Define your table models to suit your queries
Forecast data volume on your model before using it
• Will it fit at scale in the technology you plan to use?
Mind large partitions on Scylla as it can damage your cluster performance
Kafka streams are great for on the fly aggregations
Sink your aggregated data to an external store to address multiple time spans lookups
• Interactive queries = hot real time
The
Speaker’s
camera
displays
here
24
25
Scylla + Kafka at
They play (very) well
together
Change Data Capture
(CDC) in Scylla
Maheedhar Gunturu
26
Change Data Capture (CDC)
Queries the history of changes made to your database.
• Asynchronously readable by downstream consumers.
• Available since Scylla Open Source 4.0 and now available in
Scylla Enterprise 2021.1.1
27
Use cases
• Application propagating state using various microservices for
use cases like IOT, retail , security, fraud detection, customer
360
• ETL
• Integrations, migrations and streaming transformations
• Alerting and monitoring
28
CDC in Scylla: enabled per table
• Single CDC log table per enabled table
• CDC log is co-located with base table
• Partitioning matches the base table
• Mirrored columns for preimage/delta records
• Every column record contains information about modification
operation and TTL
• Rows ordered by operation timestamp and batch sequence
• CDC data is TTL:ed to 24h (configurable)
29
Scylla’s CDC write path
+ Coordinator creates CDC log table
+ Writes and piggybacks on base table
+ Writes to same replica nodes.
+ While data size written is larger, the
number of writes requests does not
change.
INSERT INTO base_table(...)...
CQL
CDC write
30
CDC log rows
• Each mutation event generates one or more rows
Row keys
Changes per non-key column (delta) – optional
Pre-image (prior state) — optional
Post-image (current state of row) – optional
• CDC log write uses same consistency level as base write
Same data guarantees
31
Consume CDC streams aka read path
• CDC data is available through normal CQL
Easy to read raw streams
Already de-duplicated
All delta and pre image values are normal CQL data
Can consume without knowledge of server internals
• Layered approach
CDC core functionality relatively simple. Allows for more
sophisticated adaptors
■ Push models etc.
32
Consume CDC streams aka read path
+ CDC data is grouped into streams
+ Divides the token ring space
+ Each stream represents a tokenization “slot”
in current topology
+ Stream is log partition key
+ Stream chosen for given write based on base
table PK tokenization
+ CDC is also the basis for Alternator
Streams (DynamoDB API)
33
CDC in Scylla
+ Easy to integrate and consume
+ Plain CQL tables
+ Robust
+ Replicated in same way as the base data
+ Reasonable overhead
+ Coalesced writes and reads to same replica ranges
+ Overhead is comparable to adding/reading from a table
+ Does not overflow if consumer fails to act
+ Data is TTL:ed
34
Quick Poll
Streaming Data from
Scylla to Kafka
Tim Berglund
Streaming Data from Scylla
to Kafka
Tim Berglund
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent
Our goal is to create an event streaming platform
and put it at the heart of every company.
We do this with a platform that builds on Apache
Kafka, available on-prem and in Confluent Cloud.
Partition 0
Partition 1
Partition 2
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Writing to Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 40
Partition 0
Partition 1
Partition 2
Partitioned Topic
Consumer A
Consumer B
Reading from Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 41
Partition 0
Partition 1
Partition 2
Partitioned Topic
Consumer A
Consumer B
Consumer A
Consumer A
Reading from Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka Connect
Scylla Source
Connector for Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka, Confluent, and Scylla
Scylla Source
connector for Kafka is
built on open source
Debezium
debezium.io
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Source Connector
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Sink Connector
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Syncing Scylla Clusters with Kafka
Use the Source and Sink connectors to exchange data
between separate Scylla clusters
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
1. Set up a Scylla Table with CDC
cqlsh> CREATE KEYSPACE ks WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck)) WITH cdc = {'enabled': true};
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
2. Configure Kafka Scylla CDC Connector
name=ScyllaCDCConnector
connector.class=com.scylladb.cdc.debezium.connector.ScyllaConnector
scylla.name=MyCluster
scylla.cluster.ip.addresses=127.0.0.2:9042
scylla.table.names=ks.t
tasks.max=10
transforms=unwraptransforms.unwrap
type=io.debezium.transforms.ExtractNewRecordState
transforms.unwrap.drop.tombstones=false
transforms.unwrap.delete.handling.mode=none
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
heartbeat.interval.ms=1000
auto.create.topics.enable=true
How it Works
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
3. Test the Connector
cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (1, 5, 10);
cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (2, 6, 12);
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
4. How it looks
cqlsh> SELECT * FROM ks.t_scylla_cdc_log ;
cdc$stream_id | cdc$time | cdc$batch_seq_no | cdc$deleted_v | cdc$end_of_batch | cdc$operation | cdc$ttl | ck | pk | v
------------------------------------+--------------------------------------+------------------+---------------+------------------+---------------+---------+----+----+----
0xc72400000000000045715fd9dc0004c1 | a2130246-4048-11eb-5b81-9b458669aa11 | 0 | null | True | 2 | null | 5 | 1 | 10
0xd049555555555556e69dc1b6b4000581 | a6723136-4048-11eb-a309-3e76e3b340e7 | 0 | null | True | 2 | null | 6 | 2 | 12
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
4. Connector correctly replicates as JSON:
Kafka message number 1 (key):
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": true,
"field": "ck"
},
{
"type": "int32",
"optional": true,
"field": "pk"
}
],
"optional": false,
"name": "ks.t.Key"
},
"payload": {
"ck": 5,
"pk": 1
}
}
Kafka message number 1 (value):
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": true,
"field": "ck"
},
{
"type": "int32",
"optional": true,
"field": "pk"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
[*snip* Etc.]
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Deltas Only...for now
• Currently only provides delta operations
• Preimage and postimage will be added in the future
• Will match nicely with “before” & “after” fields of
Debezium
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent Developer
developer.confluent.io
54
Learn Kafka!
Q&A
United States
545 Faber Place
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank you

More Related Content

What's hot

Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operatorEui Heo
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles BackupsScyllaDB
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...HostedbyConfluent
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsMIJIN AN
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Community
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...HostedbyConfluent
 

What's hot (20)

Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and Checkpoints
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
 

Similar to Building Event Streaming Architectures on Scylla and Kafka

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikHostedbyConfluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL MigrationScyllaDB
 
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayStrategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayScyllaDB
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of confluent
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 

Similar to Building Event Streaming Architectures on Scylla and Kafka (20)

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
 
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayStrategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Building Event Streaming Architectures on Scylla and Kafka

  • 1. Building Event Streaming Architectures on Scylla and Confluent with Kafka
  • 2. Tim Berglund Senior Director of Developer Advocacy Presenters Alexys Jacob CTO Maheedhar Gunturu Director of Technical Alliances Othmane El Metioui Chief Data Officer
  • 3. Agenda ᐩ Brief Intro to Scylla ᐩ Scylla + Kafka at Numberly ᐩ Change Data Capture in Scylla ᐩ Streaming Data from Scylla to Kafka
  • 4. About ScyllaDB 4 • Reimagined the NoSQL database • Close-to-the-hardware design, written in C++ • Open source, enterprise & DBaaS • From the creators of KVM hypervisor Winner Infoworld Technology of the Year
  • 5. 5 Grows with your business & your data – Volume – Multi-petabyte – Throughput – 1 billion OPS – Horizontal Scalability – 1,000-node cluster – Availability – 1 to 10+ replicas within a datacenter – Consistent Latencies – Low single-digit millisecond p99s – Vertical Scalability – 1 to 416 vCPUs – Unlimited – Cell sizes and partition width – Consistency Options – Eventual consistency to linearizability
  • 6. 6 Used across industries AdTech/MarTech Multimedia Finance/FinTech Security Ride-hailing/ Food Delivery Social Retail Travel IoT Logistics/Transportation
  • 7. Deployment options Install in Your Datacenter ➔ Scylla Open Source ➔ Scylla Enterprise ➔ AWS Outposts Deploy at a Cloud Provider ➔ Scylla Open Source ➔ Scylla Enterprise Database as a Service ➔ Fully managed Scylla clusters ➔ Bring Your Own Acct (BYOA) option On-Prem Cloud Hosted Scylla Cloud 7 Run on Kubernetes ➔ Manage with Scylla Operator Kubernetes
  • 8. 8 Scylla + Kafka at Architectural choices and overview
  • 9. 9 At Numberly, we run bare-metal clusters Scylla 3 clusters, with multi-datacenter topology • Staging • Production web facing • Production OLAP+OLTP • RF=3 per DC DELL hardware • RAID0 NVMe • up to 96 AMD cores per node • up to 512GB RAM per node Confluent Kafka 2 clusters, with active-active multi-datacenter topology • Staging • Production DELL hardware • 6 brokers 12 TB SSD ( RAID0 ) 2x 24 cores 64GB RAM • 12 other nodes Connect cluster, Schema Registry, Zookeepers...
  • 10. 10 Scylla Cloud & Confluent Cloud TL;DR: The people behind the technology know better! Cloud hosted solutions should be considered depending on your infrastructure maturity and hosting constraints. Our experience shows that cloud providers such as AWS always lag behind versions and provide poor monitoring & alerting capabilities.
  • 11. 11 Scylla + Kafka at Stack usage overview
  • 12. Scylla • Scylla Manager • Scylla Monitoring • Easy data expiration (TTL) on large time windows (6+ months) Combining Scylla and Confluent Kafka powers Confluent Kafka • Kafka Connect & Exporter • Schema registry • KSQL • Home-made control center interface + grafana Started with in-house Kafka streams and Python pipelines to propagate data changes between Scylla & Kafka 12
  • 13. Scylla • Scylla Manager • Scylla Monitoring • Easy data expiration (TTL) on large time windows (6+ months) Confluent Kafka • Kafka Connect & Exporter • Schema registry • KSQL • Home-made control center interface + grafana Combining Scylla and Confluent Kafka powers The Confluent certified CDC connector will simplify our pipelines! 13
  • 14. 14 Scylla + Kafka at Scylla is used as a low-latency remote state store providing easy data expiry capabilities to Kafka streams and pipelines (in & out)
  • 15. Use case #1 Data pipeline enrichment Scylla to the rescue in overcoming a too large JOIN window for Kafka 15
  • 16. Use case #1: how we did it before The Speaker’s camera displays here 16 Numberly’s web tracking RabbitMQ exchange Scylla 13+ months retention High throughput writes + Low latency reads, expiring data beanstalkd Python programs write + read
  • 17. Use case #1: our first attempt The Speaker’s camera displays here 17 Numberly’s web tracking Kafka streams Compacted topic read Kafka streams write Kafka connect Ktable redis
  • 18. Scaling limitations of Kafka JOIN windows • The retention of our source data enriched from Scylla is long (13+ months) Data set size average of 150+GB per table, totaling 1.2+TB source data • Multiple successive JOINs is heavy on Kafka on large datasets Large state store on RocksDB memory issues caused Kubernetes pod OOM kills Rebuilding the state store after Kafka streams restart ( pod ) was too long Standby replicas comes with a cost for large state store We turned to Scylla to be a remote, highly available, distributed state store! 18
  • 19. Use case #1: how we do it today The Speaker’s camera displays here 19 Numberly’s web tracking Kafka streams Scylla 13+ months retention High throughput writes + Low latency reads, expiring data read Kafka streams write
  • 20. Use case #1: takeaways • Metrics Metrics are important to a successful tuning (query response times, dataset size) Use prometheus client instead of implementing kafka streams metrics • Tuning Size the number of partitions regarding your query metrics Mind your time to recovery: max throughput capacity should be at least 3x the average Add Query caching that should cover your average query time, no more to maximize consistency Make sure you use a shard aware client for Scylla The Speaker’s camera displays here 20
  • 21. Use case #2 Scylla “most innovative use case” award winning Synapse platform Real time user segmentation Kafka to the rescue in overcoming large partitions on Scylla for an OLAP statistical workload 21
  • 22. Use case #2: Synapse platform The Speaker’s camera displays here 22 Numberly’s web tracking Synapse services Business rules Partners calculation Segmentation store distribution configuration
  • 23. Kafka & Scylla: a complementary match Where we chose Scylla over native Kafka ● Large number of tables with different sizes ○ Would create 10000+ topics if compact tables were used instead of Scylla ● TTL management on kafka compact table adds custom processing logic and complexity ○ Propagating Scylla expired data events stills adds complexity ○ We crave for expiration events in CDC (https://github.com/scylladb/scylla/issues/8380) ● Leverage Scylla low latency reads capability to consume or enrich data at scale Where Kafka saved the day for Scylla ● Compute real time stats on high cardinality data generated large partitions on Scylla ○ A user (partition key) is part of multiple segments (cluster key) = counting OK ○ A segment (partition key) has a great lot of users (cluster key) = large partition = counting KO 23
  • 24. Use case #2: takeaways Define your table models to suit your queries Forecast data volume on your model before using it • Will it fit at scale in the technology you plan to use? Mind large partitions on Scylla as it can damage your cluster performance Kafka streams are great for on the fly aggregations Sink your aggregated data to an external store to address multiple time spans lookups • Interactive queries = hot real time The Speaker’s camera displays here 24
  • 25. 25 Scylla + Kafka at They play (very) well together
  • 26. Change Data Capture (CDC) in Scylla Maheedhar Gunturu 26
  • 27. Change Data Capture (CDC) Queries the history of changes made to your database. • Asynchronously readable by downstream consumers. • Available since Scylla Open Source 4.0 and now available in Scylla Enterprise 2021.1.1 27
  • 28. Use cases • Application propagating state using various microservices for use cases like IOT, retail , security, fraud detection, customer 360 • ETL • Integrations, migrations and streaming transformations • Alerting and monitoring 28
  • 29. CDC in Scylla: enabled per table • Single CDC log table per enabled table • CDC log is co-located with base table • Partitioning matches the base table • Mirrored columns for preimage/delta records • Every column record contains information about modification operation and TTL • Rows ordered by operation timestamp and batch sequence • CDC data is TTL:ed to 24h (configurable) 29
  • 30. Scylla’s CDC write path + Coordinator creates CDC log table + Writes and piggybacks on base table + Writes to same replica nodes. + While data size written is larger, the number of writes requests does not change. INSERT INTO base_table(...)... CQL CDC write 30
  • 31. CDC log rows • Each mutation event generates one or more rows Row keys Changes per non-key column (delta) – optional Pre-image (prior state) — optional Post-image (current state of row) – optional • CDC log write uses same consistency level as base write Same data guarantees 31
  • 32. Consume CDC streams aka read path • CDC data is available through normal CQL Easy to read raw streams Already de-duplicated All delta and pre image values are normal CQL data Can consume without knowledge of server internals • Layered approach CDC core functionality relatively simple. Allows for more sophisticated adaptors ■ Push models etc. 32
  • 33. Consume CDC streams aka read path + CDC data is grouped into streams + Divides the token ring space + Each stream represents a tokenization “slot” in current topology + Stream is log partition key + Stream chosen for given write based on base table PK tokenization + CDC is also the basis for Alternator Streams (DynamoDB API) 33
  • 34. CDC in Scylla + Easy to integrate and consume + Plain CQL tables + Robust + Replicated in same way as the base data + Reasonable overhead + Coalesced writes and reads to same replica ranges + Overhead is comparable to adding/reading from a table + Does not overflow if consumer fails to act + Data is TTL:ed 34
  • 36. Streaming Data from Scylla to Kafka Tim Berglund
  • 37. Streaming Data from Scylla to Kafka Tim Berglund
  • 38. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Confluent Our goal is to create an event streaming platform and put it at the heart of every company. We do this with a platform that builds on Apache Kafka, available on-prem and in Confluent Cloud.
  • 39. Partition 0 Partition 1 Partition 2 Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Writing to Kafka
  • 40. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 40 Partition 0 Partition 1 Partition 2 Partitioned Topic Consumer A Consumer B Reading from Kafka
  • 41. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 41 Partition 0 Partition 1 Partition 2 Partitioned Topic Consumer A Consumer B Consumer A Consumer A Reading from Kafka
  • 42. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka Connect
  • 44. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka, Confluent, and Scylla Scylla Source connector for Kafka is built on open source Debezium debezium.io
  • 45. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Source Connector
  • 46. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Sink Connector
  • 47. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Syncing Scylla Clusters with Kafka Use the Source and Sink connectors to exchange data between separate Scylla clusters
  • 48. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 1. Set up a Scylla Table with CDC cqlsh> CREATE KEYSPACE ks WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck)) WITH cdc = {'enabled': true};
  • 49. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2. Configure Kafka Scylla CDC Connector name=ScyllaCDCConnector connector.class=com.scylladb.cdc.debezium.connector.ScyllaConnector scylla.name=MyCluster scylla.cluster.ip.addresses=127.0.0.2:9042 scylla.table.names=ks.t tasks.max=10 transforms=unwraptransforms.unwrap type=io.debezium.transforms.ExtractNewRecordState transforms.unwrap.drop.tombstones=false transforms.unwrap.delete.handling.mode=none key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=true value.converter.schemas.enable=true heartbeat.interval.ms=1000 auto.create.topics.enable=true How it Works
  • 50. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 3. Test the Connector cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (1, 5, 10); cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (2, 6, 12);
  • 51. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 4. How it looks cqlsh> SELECT * FROM ks.t_scylla_cdc_log ; cdc$stream_id | cdc$time | cdc$batch_seq_no | cdc$deleted_v | cdc$end_of_batch | cdc$operation | cdc$ttl | ck | pk | v ------------------------------------+--------------------------------------+------------------+---------------+------------------+---------------+---------+----+----+---- 0xc72400000000000045715fd9dc0004c1 | a2130246-4048-11eb-5b81-9b458669aa11 | 0 | null | True | 2 | null | 5 | 1 | 10 0xd049555555555556e69dc1b6b4000581 | a6723136-4048-11eb-a309-3e76e3b340e7 | 0 | null | True | 2 | null | 6 | 2 | 12
  • 52. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 4. Connector correctly replicates as JSON: Kafka message number 1 (key): { "schema": { "type": "struct", "fields": [ { "type": "int32", "optional": true, "field": "ck" }, { "type": "int32", "optional": true, "field": "pk" } ], "optional": false, "name": "ks.t.Key" }, "payload": { "ck": 5, "pk": 1 } } Kafka message number 1 (value): { "schema": { "type": "struct", "fields": [ { "type": "int32", "optional": true, "field": "ck" }, { "type": "int32", "optional": true, "field": "pk" }, { "type": "struct", "fields": [ { "type": "int32", [*snip* Etc.]
  • 53. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Deltas Only...for now • Currently only provides delta operations • Preimage and postimage will be added in the future • Will match nicely with “before” & “after” fields of Debezium
  • 54. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Confluent Developer developer.confluent.io 54 Learn Kafka!
  • 55. Q&A
  • 56. United States 545 Faber Place Palo Alto, CA 94303 Israel 11 Galgalei Haplada Herzelia, Israel www.scylladb.com @scylladb Thank you