SlideShare une entreprise Scribd logo
1  sur  41
Neil Dahlke, Engineer
2017 March 17
Streaming and MemSQL
About Me: Neil Dahlke
 Engineer
 Formerly Globus
• high performance data transfer for research scientists
 Past talks
• Real-time, Geospatial, Maps
 Slides: http://www.slideshare.net/MemSQL/realtime-geospatial-
maps-by-neil-dahlke
Topology
Architecture: Aggregators and Leaves
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Architecture: Aggregators Aggregate
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Architecture: Leaves Hold Partitions
Agg 1 Agg 2
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Architecture: It’s SQL All The Way Down
Agg 1 Agg 2
select avg(price) from orders;
leaf1> using memsql_demo_0
select count(1), sum(price)
from orders;
leaf2> using memsql_demo_12
select count(1), sum(price)
from orders;
...
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Latency in the Enterprise
SELECT*
FROM
WHERE
SLOW DATA
LOADING
Batched Loading
Hours to load
Sampled Data Views
No real-time ingestion
LENGTHY QUERY
EXECUTION
Slow query responses
Slow reports
Slow applications
No real-time response
LOW CONCURRENCY
Single threaded operations
Challenge with mixed workloads
Overall poor performance
REIMAGINE AN EXISTING BUSINESS PROCESS.
What if you had intra-day information to inform your decision making,
instead of daily or even weekly?
Why MemSQL?
FAST DATA
INGEST
The volume of data
that can be ingested
into the database
Why MemSQL?
LOW LATENCY
QUERIES
The time it takes to
execute queries and
receive results
Why MemSQL?
HIGH
CONCURRENCY
The ability to scale
simultaneous operations
Why MemSQL?
FAST DATA
INGEST
The volume of data
that can be ingested
into the database
LOW LATENCY
QUERIES
The time it takes to
execute queries and
receive results
HIGH
CONCURRENCY
The ability to scale
simultaneous operations
REAL WORLD
EVENT
REAL-TIME
RESPONSE
REDUCED
LATENCY
WHAT WE
ARE SEEING
A WORLD OF CONNECTED
MACHINES AND PEOPLE
Use Cases
Executive dashboards
Time series analytics
Sales analytics
Real-time data visualization
Live business dashboards
Website analytics
Real-time data with massive concurrency – millions of cars, drivers and
riders accessing the database optimizing supply, demand and pricing.
+
TECHNICAL BENEFITS
 Analyze millions of rows / second
 Analyze historical against live data
 Massive concurrency
THE UBER REAL-TIME ARCHITECTURE
REAL-TIME
ANALYTICS
REAL-TIME
INPUTS
A massively scalable database and ingest solution allowed for
massive growth, real-time analytic applications and faster, targeted.
+
 Kafka
 S3
• Persisted all logs to cold storage for eventual analysis
 Hadoop
• Nightly map-reduce jobs
 Redshift
• Took a full day to load data from previous day
• Reaching overlap of times caused data crisis
• Pre-aggregated
• Limited concurrency
Before
 Late data
 Limited access to the data once it’s in
 Long waits for insight
 Expensive
Why was this bad for their business?
Why was this bad for their data operations?
 Not scalable
 No deduplication
• aka not exactly-once
 Unfiltered and incomplete data (silos)
 Pre-aggregated data
FAST DATA
INGEST
LOW
LATENCY
QUERIES
HIGH
CONCURRENCY
After
INSTANT ACCURACY TO THE LATEST PIN
REAL-TIME
ANALYTICS
Accelerated ingest
from 24 hours to 5 secs
1 GB/sec totaling
72 TB/day
RESULTS
29
Visualizing the Data
 Demo built using
• Mapbox
• Websockets
• Tornado web server
 When an image is pinned, the circles on the globe
expand, showing higher volume areas
 Reads data from MemSQL directly
CREATE PIPELINE
Introducing MemSQL Pipelines
 CREATE PIPELINE is a database construct that enables
data ingestion with exactly-once semantics
• MemSQL stores the Kafka offset in a table
• Exactly once delivery facilitated by co-locating data and offsets
 Extract, transform, and load external data natively
 Fully distributed workloads
 User-defined transformations
 Scalable, highly performant, online ALTER TABLE and
ALTER PIPELINE
MemSQL Pipelines Sequence
1. Extract from data sources
2. Transform extracted data
3. Load transformed data into Database tables in parallel
Data
Sources
MemSQL
1. Extract 2. Transform extracted data 3. Load into Database tables
Pipelines
MemSQL Pipelines Architecture: Kafka Example
Kafka
Broker
MemSQL NodePipelines
Kafka
Broker
MemSQL NodePipelines
Kafka
Broker
MemSQL NodePipelines
MemSQL MasterPipelines
1. Extract 2. Transform 3. Load
Data
reshuffle
Metadata query
1. Extract 2. Transform 3. Load
1. Extract 2. Transform 3. Load
Understanding
CREATE PIPELINE
and Streamliner
Getting Data to MemSQL
CREATE PIPELINE Streamliner
Parallel loading from multiple sources Parallel loading from multiple sources
Direct to leaf nodes
Data to multiple aggregators, then leaf
nodes
Native feature Built with Apache Spark
Exactly-once semantics
Demo
Q&A
Thank You
Learn More
 [ODBMS Watch] Powering Big Data at Pinterest.
Interview with Krishna Gade
 [GigaOm] Pinterest is experimenting with MemSQL for
real-time data analytics
 [InfoQ] Real-time Data Analytics at Pinterest using
MemSQL and Spark Streaming
 [MemSQL Blog] How Pinterest Measures Real-Time User
Engagement with Spark
 [Pinterest Engineering Blog] Real-time analytics at
Pinterest
Resources
 https://github.com/memsql/memsql-spark-connector
 http://docs.memsql.com/docs/streamliner-administration
 http://docs.memsql.com/docs/pipelines-overview
 https://github.com/memsql/memsql-docker-quickstart

Contenu connexe

Tendances

Bringing olap fully online analyze changing datasets in mem sql and spark wi...
Bringing olap fully online  analyze changing datasets in mem sql and spark wi...Bringing olap fully online  analyze changing datasets in mem sql and spark wi...
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
SingleStore
 

Tendances (20)

An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Building Software to Scale
Building Software to Scale Building Software to Scale
Building Software to Scale
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
In-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 InstancesIn-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 Instances
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
 
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
Bringing olap fully online  analyze changing datasets in mem sql and spark wi...Bringing olap fully online  analyze changing datasets in mem sql and spark wi...
Bringing olap fully online analyze changing datasets in mem sql and spark wi...
 

En vedette

En vedette (19)

Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time Analytics
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine Learning
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Diapo corte #2
Diapo corte #2Diapo corte #2
Diapo corte #2
 
El cordero asado
El cordero asadoEl cordero asado
El cordero asado
 
Mec construindo a escola cidadã
Mec   construindo a escola cidadãMec   construindo a escola cidadã
Mec construindo a escola cidadã
 
Presentació animals 2n_C
Presentació animals 2n_CPresentació animals 2n_C
Presentació animals 2n_C
 
Getting Started with Math 20
Getting Started with Math 20Getting Started with Math 20
Getting Started with Math 20
 
Aero u1
Aero u1Aero u1
Aero u1
 
Expo tech parte_2[1]
Expo tech parte_2[1]Expo tech parte_2[1]
Expo tech parte_2[1]
 
ejemplos de funcion lineal
ejemplos de funcion linealejemplos de funcion lineal
ejemplos de funcion lineal
 
Why Saint Jude's Needs YOU, And So Do the Children
Why Saint Jude's Needs YOU, And So Do the ChildrenWhy Saint Jude's Needs YOU, And So Do the Children
Why Saint Jude's Needs YOU, And So Do the Children
 
конус
конусконус
конус
 
план засідання методичного 2016 2017
план засідання методичного 2016 2017план засідання методичного 2016 2017
план засідання методичного 2016 2017
 

Similaire à Real-Time Analytics with Spark and MemSQL

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 

Similaire à Real-Time Analytics with Spark and MemSQL (20)

Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
Nosql- Introduction for Beginners
Nosql-  Introduction for BeginnersNosql-  Introduction for Beginners
Nosql- Introduction for Beginners
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Distributed Data Quality - Technical Solutions for Organizational Scaling
Distributed Data Quality - Technical Solutions for Organizational ScalingDistributed Data Quality - Technical Solutions for Organizational Scaling
Distributed Data Quality - Technical Solutions for Organizational Scaling
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 

Plus de SingleStore

Plus de SingleStore (17)

Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
 
Machines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata KeynoteMachines and the Magic of Fast Learning - Strata Keynote
Machines and the Magic of Fast Learning - Strata Keynote
 

Dernier

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 

Dernier (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 

Real-Time Analytics with Spark and MemSQL

  • 1. Neil Dahlke, Engineer 2017 March 17 Streaming and MemSQL
  • 2. About Me: Neil Dahlke  Engineer  Formerly Globus • high performance data transfer for research scientists  Past talks • Real-time, Geospatial, Maps  Slides: http://www.slideshare.net/MemSQL/realtime-geospatial- maps-by-neil-dahlke
  • 4. Architecture: Aggregators and Leaves Agg 1 Agg 2 Leaf 1 Leaf 2 Leaf 3 Leaf 4
  • 5. Architecture: Aggregators Aggregate Agg 1 Agg 2 Leaf 1 Leaf 2 Leaf 3 Leaf 4
  • 6. Architecture: Leaves Hold Partitions Agg 1 Agg 2 Leaf 1 Leaf 2 Leaf 3 Leaf 4
  • 7. Architecture: It’s SQL All The Way Down Agg 1 Agg 2 select avg(price) from orders; leaf1> using memsql_demo_0 select count(1), sum(price) from orders; leaf2> using memsql_demo_12 select count(1), sum(price) from orders; ... Leaf 1 Leaf 2 Leaf 3 Leaf 4
  • 8. Latency in the Enterprise SELECT* FROM WHERE SLOW DATA LOADING Batched Loading Hours to load Sampled Data Views No real-time ingestion LENGTHY QUERY EXECUTION Slow query responses Slow reports Slow applications No real-time response LOW CONCURRENCY Single threaded operations Challenge with mixed workloads Overall poor performance
  • 9. REIMAGINE AN EXISTING BUSINESS PROCESS. What if you had intra-day information to inform your decision making, instead of daily or even weekly?
  • 10. Why MemSQL? FAST DATA INGEST The volume of data that can be ingested into the database
  • 11. Why MemSQL? LOW LATENCY QUERIES The time it takes to execute queries and receive results
  • 12. Why MemSQL? HIGH CONCURRENCY The ability to scale simultaneous operations
  • 13. Why MemSQL? FAST DATA INGEST The volume of data that can be ingested into the database LOW LATENCY QUERIES The time it takes to execute queries and receive results HIGH CONCURRENCY The ability to scale simultaneous operations
  • 15. WHAT WE ARE SEEING A WORLD OF CONNECTED MACHINES AND PEOPLE
  • 16. Use Cases Executive dashboards Time series analytics Sales analytics Real-time data visualization Live business dashboards Website analytics
  • 17. Real-time data with massive concurrency – millions of cars, drivers and riders accessing the database optimizing supply, demand and pricing. +
  • 18. TECHNICAL BENEFITS  Analyze millions of rows / second  Analyze historical against live data  Massive concurrency
  • 19. THE UBER REAL-TIME ARCHITECTURE REAL-TIME ANALYTICS REAL-TIME INPUTS
  • 20. A massively scalable database and ingest solution allowed for massive growth, real-time analytic applications and faster, targeted. +
  • 21.  Kafka  S3 • Persisted all logs to cold storage for eventual analysis  Hadoop • Nightly map-reduce jobs  Redshift • Took a full day to load data from previous day • Reaching overlap of times caused data crisis • Pre-aggregated • Limited concurrency Before
  • 22.  Late data  Limited access to the data once it’s in  Long waits for insight  Expensive Why was this bad for their business?
  • 23. Why was this bad for their data operations?  Not scalable  No deduplication • aka not exactly-once  Unfiltered and incomplete data (silos)  Pre-aggregated data FAST DATA INGEST LOW LATENCY QUERIES HIGH CONCURRENCY
  • 24. After
  • 25. INSTANT ACCURACY TO THE LATEST PIN REAL-TIME ANALYTICS
  • 26.
  • 27. Accelerated ingest from 24 hours to 5 secs 1 GB/sec totaling 72 TB/day RESULTS
  • 28.
  • 29. 29
  • 30. Visualizing the Data  Demo built using • Mapbox • Websockets • Tornado web server  When an image is pinned, the circles on the globe expand, showing higher volume areas  Reads data from MemSQL directly
  • 32. Introducing MemSQL Pipelines  CREATE PIPELINE is a database construct that enables data ingestion with exactly-once semantics • MemSQL stores the Kafka offset in a table • Exactly once delivery facilitated by co-locating data and offsets  Extract, transform, and load external data natively  Fully distributed workloads  User-defined transformations  Scalable, highly performant, online ALTER TABLE and ALTER PIPELINE
  • 33. MemSQL Pipelines Sequence 1. Extract from data sources 2. Transform extracted data 3. Load transformed data into Database tables in parallel Data Sources MemSQL 1. Extract 2. Transform extracted data 3. Load into Database tables Pipelines
  • 34. MemSQL Pipelines Architecture: Kafka Example Kafka Broker MemSQL NodePipelines Kafka Broker MemSQL NodePipelines Kafka Broker MemSQL NodePipelines MemSQL MasterPipelines 1. Extract 2. Transform 3. Load Data reshuffle Metadata query 1. Extract 2. Transform 3. Load 1. Extract 2. Transform 3. Load
  • 36. Getting Data to MemSQL CREATE PIPELINE Streamliner Parallel loading from multiple sources Parallel loading from multiple sources Direct to leaf nodes Data to multiple aggregators, then leaf nodes Native feature Built with Apache Spark Exactly-once semantics
  • 37. Demo
  • 38. Q&A
  • 40. Learn More  [ODBMS Watch] Powering Big Data at Pinterest. Interview with Krishna Gade  [GigaOm] Pinterest is experimenting with MemSQL for real-time data analytics  [InfoQ] Real-time Data Analytics at Pinterest using MemSQL and Spark Streaming  [MemSQL Blog] How Pinterest Measures Real-Time User Engagement with Spark  [Pinterest Engineering Blog] Real-time analytics at Pinterest
  • 41. Resources  https://github.com/memsql/memsql-spark-connector  http://docs.memsql.com/docs/streamliner-administration  http://docs.memsql.com/docs/pipelines-overview  https://github.com/memsql/memsql-docker-quickstart