SlideShare une entreprise Scribd logo
1  sur  49
How Kafka and Modern
Databases Benefit Apps
and Analytics
1
Neil Dahlke, Sr. Sales Engineer, San Francisco
August 20 2018
2
● Intro
● Possible Solutions
● New Data Architecture
● Scalable SQL
● CREATE PIPELINE
● Demo
● Q&A
Agenda
Intro
3
AT MEMSQL
Sr. Sales Engineer, San Francisco
BEFORE MEMSQL
Worked on Globus project out @
University of Chicago
PREVIOUS TALKS
Real Time, Geospatial, Maps
Image Recognition on Streaming
Real Time w/ Spark & MemSQL
4
Who am I?
5
“Companies with data-driven environments
have up to 50% higher market value than
other businesses.”
6
Organizations want more of their data to
support faster decisions and optimize customer
experiences
This is putting pressure on database
performance and scalability but without
sacrificing familiar tooling and skills
Data Driven Requirements Driving
Database Modernization
7 Businesses Require Intra-Day
Slow Data Loading
Batch processing
Hours to load
Sampled data views
8 Growing Data Slows Performance
Lengthy Query Execution
Slow query responses
Slow reports
No real-time response
9 Data Access Requirements Surging
Limited User Access
Single threaded operations
Challenge with mixed workloads
Single box performance
10 Multi / Hybrid Cloud Strategy
● Existing solutions have unclear path
to cloud
● Data growing exponentially year
over year
● Still managing on-premises data
● Requires database to run anywhere
Possible Solutions
11
More CPUs
or memory
Specialized
HW racks
Database
Options
Boost hardware or add more DB options introduces cost
12 Double Down on Existing Database
Adding data grids, caches, and accelerators introduces complexity
13 Introduce Caching Tiers
Limited data
durability
Weak SQL
coverage
Another layer
To manage
14 Try Object Store based NoSQL Solutions
Slow performing
analytics
Developer
intensive queries
Breaks BI tool
compatibility
15 Latency Holding Back the Enterprise
Lengthy Query Execution
Slow query responses
Slow reports
No real-time response
Limited User Access
Single threaded operations
Challenge with mixed workloads
Single box performance
Slow Data Loading
Batch processing
Hours to load
Sampled data views
16 The Enterprise Requires Performance
Fast Queries
Scalable ANSI SQL
Petabyte scale
Live and historical insights
Scalable User Access
Scale-out for performance
Converged transactions and analytics
Multi-threaded processing
Live Loading
Stream data
On-the-fly transformation
Multiple sources
MemSQL: The No Limits Database17
For Every Workload
and Infrastructure
On-premises or any cloud
Transactions and analytics
Familiar, standard
scalable SQL
Distributed architecture
Relational ANSI SQL
Performance for
Demanding
Applications
Fast ingest
Low latent queries
Ecosystem Overview
High
Speed
Ingest
Memory
Optimized
Rowstore
Disk
Optimized
Columnstore
Real-Time Data
Messaging and
Transforms
Data Inputs BI Dashboards
Kafka Spark
Relational Hadoop Amazon S3
Bare Metal, Virtual Machines, Containers On-Premises, Multi-Cloud, Hybrid Cloud
Real-Time Applications
Tableau Looker Microstrategy
18
Relational Key-Value Document Geospatial
New Data
Architecture
19
20
21
22
23
24
25
26
14
MemSQL: The No-Limits Database
● Massive Scale
● Query Performance
● High Concurrency
The transactional scale of
NoSQL with familiar
relational SQL for fast
analytics
Scalable
SQL
28
MemSQL is a database, a Linux daemon
./memsqld
MemSQL is a distributed system
./memsqld./memsqld
./memsqld
Aggregators Aggregate
./memsqld./memsqld
Aggregator
Leaves Hold Partitions and Process Data
./memsqld./memsqld
Aggregator
LeafLeaf
PARTITIONS
Leaf
PARTITIONS
Aggregators interact with clients
and leverage leaf nodes
aggregator-1> create database foo;
Query OK, 1 row affected (5.48 sec)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Database Client
LeafLeaf
PARTITIONS PARTITIONS
Aggregator
leaf-2> show databases;
+--------------------+
| Database |
+--------------------+
| cluster |
| foo |
| foo_1 |
| foo_3 |
| foo_5 |
| foo_7 |
| foo_9 |
| foo_11 |
| information_schema |
| memsql |
+--------------------+
10 rows in set (0.01 sec)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Database Client
LeafLeaf
PARTITIONS PARTITIONS
Aggregator
Leaves store a partition per core on
the machine (by default)
aggregator-1> SELECT avg(price) FROM
orders;
...
1
2
3
4
leaf-1> using memsql_demo_9 SELECT
count(1), sum(price) FROM orders;
...
1
2
3
4
leaf-2> using memsql_demo_17 SELECT
count(1), sum(price) FROM orders;
...
1
2
3
4
Database Client
LeafLeaf
PARTITIONS PARTITIONS
Aggregator
Massively parallel processing (MPP)
across all the leaf nodes for query
execution
aggregator-1> ADD LEAF leaf-3…
aggregator-1> REBALANCE PARTITIONS;
1
2
3
4
Database Client
Aggregator
LeafLeafLeafLeaf
PARTITIONS PARTITIONS PARTITIONS PARTITIONS
aggregator-1> ADD LEAF leaf-4…
aggregator-1> REBALANCE PARTITIONS;
1
2
3
4
Scale up and down on the fly
[memsql.cnf]
master-agg=agg-1
1
2
3
4
Database Client
AggregatorAggregator
LeafLeafLeafLeaf
PARTITIONS PARTITIONS PARTITIONS PARTITIONS
Aggregators too
Apache Kafka38
● Messaging Queue
● Distributed
● Durable
● Publish-Subscribe
● Process
● “Source of Truth”
● Open Source
Deliver Faster Insights
● Scalable ANSI SQL
● Full ACID capabilities
● Support for JSON, Geospatial,
and Full-Text Search
● Fast Query Vectorization and
Compilation
● Extensibility with Stored
Procedures, UDFs, UDAs
39
Fast Data Ingestion
● Stream ingestion
● Fast parallel bulk loading
● Built-in Create Pipeline
● Transactional Consistency
● Exactly-Once Semantics
● Native integrations with
Kafka, AWS S3, Azure Blob,
HDFS
40
41
Stream ingestion
Batch loading
Fully parallel
Arbitrary transforms
Any language
Transactional consistency
Exactly-once semantics
CREATE
PIPELINE
42
1
2
3
4
5
6
7
CREATE PIPELINE twitter_pipeline AS
LOAD DATA KAFKA "public-kafka.memcompute.com:9092/tweets-json"
INTO TABLE tweets
WITH TRANSFORM (‘/path/to/executable’, ‘arg1’, ‘arg2’)
(id, tweet);
START PIPELINE twitter_pipeline;
43
Data Source
(ex: NFS, S3, HDFS,
Kafka)
MemSQLPIPELINE
MemSQL polls for changes from a source system.1
1
44
Data Source
(ex: NFS, S3, HDFS,
Kafka)
MemSQLPIPELINE
MemSQL polls for changes from a source system.
MemSQL pulls the data into it’s memory space (no commit) where a transform can be applied.
1
2
1
2
45
Data Source
(ex: NFS, S3, HDFS,
Kafka)
MemSQLPIPELINE
MemSQL polls for changes from a data source system.
MemSQL pulls the data into it’s memory space (no commit) where a transform can be applied.
The data is committed in a transaction (and in parallel)
1
1
3
3
2
2
46
LeafPIPELINE
Kafka Broker 1
Kafka Broker 2
Kafka Broker 3
Kafka Broker 4
LeafPIPELINE
LeafPIPELINE
LeafPIPELINE
Data
reshuffle
AggregatorPIPELINE
Metadata
query
Demo
47
Q&A
48
Thank You

Contenu connexe

Tendances

Bringing olap fully online analyze changing datasets in mem sql and spark wi...
Bringing olap fully online  analyze changing datasets in mem sql and spark wi...Bringing olap fully online  analyze changing datasets in mem sql and spark wi...
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
SingleStore
 

Tendances (20)

Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
Bringing olap fully online  analyze changing datasets in mem sql and spark wi...Bringing olap fully online  analyze changing datasets in mem sql and spark wi...
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
In-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 InstancesIn-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 Instances
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
 
Personalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud StreamingPersonalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud Streaming
 
Introducing MemSQL 4
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
 

Similaire à How Kafka and Modern Databases Benefit Apps and Analytics

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Gavin M
Gavin MGavin M
Gavin M
Ontico
 
Stateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk ChoStateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Redis Labs
 

Similaire à How Kafka and Modern Databases Benefit Apps and Analytics (20)

Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
EDB Postgres in DBaaS & Container Platforms
EDB Postgres in DBaaS & Container PlatformsEDB Postgres in DBaaS & Container Platforms
EDB Postgres in DBaaS & Container Platforms
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
Distributed Data Quality - Technical Solutions for Organizational Scaling
Distributed Data Quality - Technical Solutions for Organizational ScalingDistributed Data Quality - Technical Solutions for Organizational Scaling
Distributed Data Quality - Technical Solutions for Organizational Scaling
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Gavin M
Gavin MGavin M
Gavin M
 
Stateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk ChoStateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
Stateful Interaction In Serverless Architecture With Redis: Pyounguk Cho
 

Plus de SingleStore

Plus de SingleStore (19)

MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
 
Machines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata KeynoteMachines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata Keynote
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
 

Dernier

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 

Dernier (20)

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

How Kafka and Modern Databases Benefit Apps and Analytics

  • 1. How Kafka and Modern Databases Benefit Apps and Analytics 1 Neil Dahlke, Sr. Sales Engineer, San Francisco August 20 2018
  • 2. 2 ● Intro ● Possible Solutions ● New Data Architecture ● Scalable SQL ● CREATE PIPELINE ● Demo ● Q&A Agenda
  • 4. AT MEMSQL Sr. Sales Engineer, San Francisco BEFORE MEMSQL Worked on Globus project out @ University of Chicago PREVIOUS TALKS Real Time, Geospatial, Maps Image Recognition on Streaming Real Time w/ Spark & MemSQL 4 Who am I?
  • 5. 5 “Companies with data-driven environments have up to 50% higher market value than other businesses.”
  • 6. 6 Organizations want more of their data to support faster decisions and optimize customer experiences This is putting pressure on database performance and scalability but without sacrificing familiar tooling and skills Data Driven Requirements Driving Database Modernization
  • 7. 7 Businesses Require Intra-Day Slow Data Loading Batch processing Hours to load Sampled data views
  • 8. 8 Growing Data Slows Performance Lengthy Query Execution Slow query responses Slow reports No real-time response
  • 9. 9 Data Access Requirements Surging Limited User Access Single threaded operations Challenge with mixed workloads Single box performance
  • 10. 10 Multi / Hybrid Cloud Strategy ● Existing solutions have unclear path to cloud ● Data growing exponentially year over year ● Still managing on-premises data ● Requires database to run anywhere
  • 12. More CPUs or memory Specialized HW racks Database Options Boost hardware or add more DB options introduces cost 12 Double Down on Existing Database
  • 13. Adding data grids, caches, and accelerators introduces complexity 13 Introduce Caching Tiers Limited data durability Weak SQL coverage Another layer To manage
  • 14. 14 Try Object Store based NoSQL Solutions Slow performing analytics Developer intensive queries Breaks BI tool compatibility
  • 15. 15 Latency Holding Back the Enterprise Lengthy Query Execution Slow query responses Slow reports No real-time response Limited User Access Single threaded operations Challenge with mixed workloads Single box performance Slow Data Loading Batch processing Hours to load Sampled data views
  • 16. 16 The Enterprise Requires Performance Fast Queries Scalable ANSI SQL Petabyte scale Live and historical insights Scalable User Access Scale-out for performance Converged transactions and analytics Multi-threaded processing Live Loading Stream data On-the-fly transformation Multiple sources
  • 17. MemSQL: The No Limits Database17 For Every Workload and Infrastructure On-premises or any cloud Transactions and analytics Familiar, standard scalable SQL Distributed architecture Relational ANSI SQL Performance for Demanding Applications Fast ingest Low latent queries
  • 18. Ecosystem Overview High Speed Ingest Memory Optimized Rowstore Disk Optimized Columnstore Real-Time Data Messaging and Transforms Data Inputs BI Dashboards Kafka Spark Relational Hadoop Amazon S3 Bare Metal, Virtual Machines, Containers On-Premises, Multi-Cloud, Hybrid Cloud Real-Time Applications Tableau Looker Microstrategy 18 Relational Key-Value Document Geospatial
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 14 MemSQL: The No-Limits Database ● Massive Scale ● Query Performance ● High Concurrency The transactional scale of NoSQL with familiar relational SQL for fast analytics
  • 29. MemSQL is a database, a Linux daemon ./memsqld
  • 30. MemSQL is a distributed system ./memsqld./memsqld ./memsqld
  • 32. Leaves Hold Partitions and Process Data ./memsqld./memsqld Aggregator LeafLeaf PARTITIONS Leaf PARTITIONS
  • 33. Aggregators interact with clients and leverage leaf nodes aggregator-1> create database foo; Query OK, 1 row affected (5.48 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Database Client LeafLeaf PARTITIONS PARTITIONS Aggregator
  • 34. leaf-2> show databases; +--------------------+ | Database | +--------------------+ | cluster | | foo | | foo_1 | | foo_3 | | foo_5 | | foo_7 | | foo_9 | | foo_11 | | information_schema | | memsql | +--------------------+ 10 rows in set (0.01 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Database Client LeafLeaf PARTITIONS PARTITIONS Aggregator Leaves store a partition per core on the machine (by default)
  • 35. aggregator-1> SELECT avg(price) FROM orders; ... 1 2 3 4 leaf-1> using memsql_demo_9 SELECT count(1), sum(price) FROM orders; ... 1 2 3 4 leaf-2> using memsql_demo_17 SELECT count(1), sum(price) FROM orders; ... 1 2 3 4 Database Client LeafLeaf PARTITIONS PARTITIONS Aggregator Massively parallel processing (MPP) across all the leaf nodes for query execution
  • 36. aggregator-1> ADD LEAF leaf-3… aggregator-1> REBALANCE PARTITIONS; 1 2 3 4 Database Client Aggregator LeafLeafLeafLeaf PARTITIONS PARTITIONS PARTITIONS PARTITIONS aggregator-1> ADD LEAF leaf-4… aggregator-1> REBALANCE PARTITIONS; 1 2 3 4 Scale up and down on the fly
  • 38. Apache Kafka38 ● Messaging Queue ● Distributed ● Durable ● Publish-Subscribe ● Process ● “Source of Truth” ● Open Source
  • 39. Deliver Faster Insights ● Scalable ANSI SQL ● Full ACID capabilities ● Support for JSON, Geospatial, and Full-Text Search ● Fast Query Vectorization and Compilation ● Extensibility with Stored Procedures, UDFs, UDAs 39
  • 40. Fast Data Ingestion ● Stream ingestion ● Fast parallel bulk loading ● Built-in Create Pipeline ● Transactional Consistency ● Exactly-Once Semantics ● Native integrations with Kafka, AWS S3, Azure Blob, HDFS 40
  • 41. 41 Stream ingestion Batch loading Fully parallel Arbitrary transforms Any language Transactional consistency Exactly-once semantics CREATE PIPELINE
  • 42. 42 1 2 3 4 5 6 7 CREATE PIPELINE twitter_pipeline AS LOAD DATA KAFKA "public-kafka.memcompute.com:9092/tweets-json" INTO TABLE tweets WITH TRANSFORM (‘/path/to/executable’, ‘arg1’, ‘arg2’) (id, tweet); START PIPELINE twitter_pipeline;
  • 43. 43 Data Source (ex: NFS, S3, HDFS, Kafka) MemSQLPIPELINE MemSQL polls for changes from a source system.1 1
  • 44. 44 Data Source (ex: NFS, S3, HDFS, Kafka) MemSQLPIPELINE MemSQL polls for changes from a source system. MemSQL pulls the data into it’s memory space (no commit) where a transform can be applied. 1 2 1 2
  • 45. 45 Data Source (ex: NFS, S3, HDFS, Kafka) MemSQLPIPELINE MemSQL polls for changes from a data source system. MemSQL pulls the data into it’s memory space (no commit) where a transform can be applied. The data is committed in a transaction (and in parallel) 1 1 3 3 2 2
  • 46. 46 LeafPIPELINE Kafka Broker 1 Kafka Broker 2 Kafka Broker 3 Kafka Broker 4 LeafPIPELINE LeafPIPELINE LeafPIPELINE Data reshuffle AggregatorPIPELINE Metadata query