To remain competitive, organizations need to democratize access to fast analytics, not only to gain real-time insights on their business but also to power smart apps that need to react in the moment. In this session, you will learn how Kafka and SingleStore enable modern, yet simple data architecture to analyze both fast paced incoming data as well as large historical datasets. In particular, you will understand why SingleStore is well suited process data streams coming from Kafka.
Scaling API-first – The story of a global engineering organization
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architecture | Rohit Reddy, SingleStore
1. —
Kafka & SingleStore: Better Together
to Power Modern Real-Time Data
Architecture
Rohit Reddy
Principal Solutions Engineer
SingleStore
2. 2
Increasing Focus on Cloud and Real-Time Analytics
90% 75% 30%
By 2022,
public cloud services
will be essential for
90% of data and
analytics innovation
By 2022, 75% of all
workloads will move
to Hybrid-Cloud
By 2025, nearly 30%
of all data generated
will be real-time
Gartner Top 10 Trends in Data
and Analytics for 2020
McKinsey - Unlocking business acceleration in a
hybrid cloud world, Aug 2019
IDC - Data Age 2025
3. 3
—
Traditional Data Architecture
System of Record
System of Engagement
Data Warehouse
ODS
ETL
CDC
Reporting
Visualization
Batch Engine
Real-Time Engine
Data Lake
4. 4
—
Modern Real-Time Data Architecture
System of Record System of Engagement
Smart Apps
System of Insight
System of Intelligence
CDC
5. 5
The Unified Database for Fast Analytics
Data
Warehouse
Operational
Database
Transactional Workloads
Fast Queries | Large Data Size
Aggregation
Fast Lookup | High Concurrency
Simplifies the support of diverse workloads by
reducing operational complexity
Analytical Workloads
6. 6
Ultra-Fast Ingest
Parallel, high-scale
streaming data Ingest
Super-Low Latency High Concurrency
Blazing Fast
Queries
Unparalleled
Scalability
Billions of events/sec for
immediate availability
Sub-second latencies with
immediate consistency
Millions of real-time queries across
tens of thousands of users
Fast Analytics
Fast analytics on dynamic data
for complex analytical queries
SingleStore - Key Capabilities for Fast Analytics
7. 7
—
Kafka & SingleStore Meet Demands of Operational Analytics
Real-Time
Millions of records per second
Consistent
Exactly-once semantic
Distributed, Fault Tolerant
Parallel ingest
Developer-Friendly
Pub-Sub & SQL
8. 8
—
Anatomy of SingleStore Pipelines Sequence
SingleStore pulls for
changes from a data
source system.
SingleStore pulls the data into its
memory space (no commit) where a
transform can be applied.
The data is committed in a
transaction (and in parallel)
Pipelines
Kafka
SingleStore
Data can be directly inserted
into tables or pre-processed
by a stored procedure
Write to Kafka
9. 9
SingleStore Pipelines Creation
10M
UPSERTS PER SECOND
WITH KAFKA + SINGLESTORE
CREATE OR REPLACE PIPELINE load_trade_data
AS LOAD DATA KAFKA 'hostname:9092/trades'
WITH TRANSFORM ('score_data.py','','') -- optional
INTO TABLE live_predictions -- directly into tables
INTO PROCEDURE trade_proc -- via a stored procedure
FIELDS TERMINATED BY ',';
START PIPELINE load_trade_data;
;
—
11. 11
—
SingleStore Transforms
● Build transforms using any language!
● Transforms are an optional user-defined
program that receives data from a pipeline’s
extractor and outputs modified data (JSON,
Avro, CSV)
○ Examples: Data modification,
aggregation, feature engineering,
model execution, and more!
● Linux distribution must have the required
dependencies to execute the transform
● Data streamed to the transform is
byte-length encoded
Stream Transform Load
12. 12
—
SingleStore Stored Procedures
DELIMITER //
CREATE OR REPLACE PROCEDURE tweets_proc(batch QUERY(tweet JSON))
AS
BEGIN
INSERT IGNORE INTO tweets(tweet_id, tweet_user_id, tweet_text)
SELECT tweet::tweet_id, tweet::tweet_user_id, tweet::tweet_text
FROM batch;
INSERT INTO retweets_counter(user_id, num_retweets)
SELECT tweet::retweet_user_id, 1
FROM batch
WHERE tweet::retweet_user_id IS NOT NULL
ON DUPLICATE KEY UPDATE num_retweets = num_retweets + 1;
END //
DELIMITER ;
;
● Preprocess incoming data: cleansing,
aggregation, filtering…
● Dispatch to multiple tables
● Cross-reference with dimension tables
● Integrity check
● Push to Kafka
13. 13
—
SingleStore Push to Kafka
● Allows users to leverage SingleStore as a true Operational Data Hub with downstream
decisioning
● “SELECT … INTO KAFKA …” runs a SELECT query, constructs Kafka message for each
row in the result set, and publishes the messages to a Kafka topic
● Includes every column value in the result set’s row and separates the column values
by a delimiter
● Configure security credentials within the statement easily
SELECT col1, col2, col3 FROM t
ORDER BY col1
INTO KAFKA 'host.example.com:9092/test-topic'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY "t"
LINES TERMINATED BY '}' STARTING BY '{';
14. 14
—
SingleStore Confluent Kafka Connector
● SingleStore Kafka Connect Connector on the
Confluent Hub
● Integration with Confluent Kafka Connect to stream
data into SingleStore
● Management and deployment capabilities of Confluent
make this incredibly easy to get started
● Cloud-first: Kafka Connector sits Kafka-side,
eliminating many potential security constraints
5X
THAN JDBC CONNECTOR
FASTER
15. Real-time fraud analytics
for Credit card swipes in
less than 50ms.
Real-time geospatial
insights with massive
concurrency to
manage 24/7
operations
300K
Events per
second
Streaming analytics to
drive proactive care and
real-time
recommendations
IoT Analytics ingesting
and analyzing data from
over 1.2 Million smart
meters
13x data growth
moving from batch to
near-real time
visibility and analytics
3500+
Users
1.2M
Smart meters
analyzed
10M
Upserts per
second
Tier-1 US
Bank
50ms
Real-time Fraud
Detection
Top Energy
Company
16. SingleStore is the Unified Database
for Fast Analytics on Any Data, Anywhere
17. 17
Learn Your Way
—
Get Started with
$500 in Free Credits
Today
Go to
singlestore.com/managed-service-trial
● Learn by Reading
○ docs.singlestore.com
● Learn by Engaging Peers
○ singlestore.com/forum
● Learn by Watching
○ youtube.com/singlestore
● Learn through Training
○ training.singlestore.com
18. —
18
Thank You
Sales
Please fill out the form if you need to learn
more.
For immediate sales help, call us at
1-855-463-6775 or email us at
team@singlestore.com.
Enterprise Edition Support
Are you encountering an issue and
have an enterprise support?
Submit a support request.
U.S. Office Locations
San Francisco (HQ)
534 Fourth Street
San Francisco, CA 94107
Seattle
96 Union Street
Seattle, WA 98101
Portland
700 SW Fifth Ave
Portland, OR 97204