SlideShare a Scribd company logo
1 of 21
Download to read offline
A complete data platform for
real-time analytics
Apr 2021
Druid + Kafka
Transform your data-in-motion to analytics-in-motion
July 2021
© 2021 Imply Data, Inc.
© 2021 Imply Data, Inc.
Who am I?
Gian Merlino
Committer & PMC chair at
Cofounder at (we’re hiring!)
© 2021 Imply Data, Inc.
Agenda
What does it mean to set analytics in motion?
Introduction to Apache Druid
Learn how Druid + Kafka work together
Try this at home
© 2021 Imply Data, Inc.
What does it mean to set analytics in motion?
© 2021 Imply Data, Inc.
Analytics-in-motion requirements
Interactive queries
Sub-second OLAP queries
Highly scalable
Massive data volumes, query
concurrency, ingestion throughput
Real-time
Ingest in true real-time from Kafka
Cost-effective
Fastest queries at lowest cost
© 2021 Imply Data, Inc.
Analytics-in-motion applications
© 2021 Imply Data, Inc.
Intro to Apache Druid
© 2021 Imply Data, Inc.
Druid is a powerful real-time database
=
© 2021 Imply Data, Inc.
Druid is a fancy calculator
=
© 2021 Imply Data, Inc.
Druid in the wild
© 2021 Imply Data, Inc.
Some numbers from Imply customers
Petabytes of data under management
1000s of daily active users
Millions of rows/second ingested
90%ile query time < 1s
99%ile query time < 2s
© 2021 Imply Data, Inc.
Druid is designed for real-time performance
Data sourced from: Correia, José & Costa, Carlos & Santos, Maribel. (2019). Challenging SQL-on-Hadoop Performance with Apache Druid.
© 2021 Imply Data, Inc.
Druid is designed for real-time performance
Data layer
Deep storage
Query layer
Streaming data is
immediately queryable
Hot, already-ingested
data cached locally
Filtered and aggregated data
transferred to query layer
Ingested data
stored in indexed
columnar format
© 2021 Imply Data, Inc.
How Druid + Kafka work together
© 2021 Imply Data, Inc.
Stream-only pattern
Kafka
Data
sources
Real-time
processing
Real-time
delivery
Kafka Streams
ksqlDB
Real-time
database
Presentation
© 2021 Imply Data, Inc.
Kafka + data lake pattern
Data lake
Kafka
Data
sources
File-based
storage
Data lake
Data lake
Query
engine
Query
engine
Query
engine
🔥 Analytics in motion
Federated query engine
(Presto, Hive, Spark, …)
Real-time
delivery
Direct load from Kafka topic
Backfill if
needed
© 2021 Imply Data, Inc.
Demo!
© 2021 Imply Data, Inc.
Try this at home
© 2021 Imply Data, Inc.
Stay in touch
19
@druidio
Join the community!
https://druid.apache.org/community/
Free online training by Imply!
https://imply.io/druid-university
Follow the Druid project on Twitter!
© 2021 Imply Data, Inc.
Run with Imply
20
Available at https://imply.io/
Point and click deployment
AWS, GCP, Kubernetes
Bring-your-own VM / bare metal
Batteries included: visualizations,
dashboarding, alerts
A complete data platform for
real-time analytics
Apr 2021
Thank you for attending
July 2021
© 2021 Imply Data, Inc.

More Related Content

What's hot

Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
DataWorks Summit
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 

What's hot (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Spark tuning
Spark tuningSpark tuning
Spark tuning
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
OSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of EgeriaOSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of Egeria
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Similar to Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Merlino, Imply

Similar to Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Merlino, Imply (20)

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Operationalizing Pluralsight's Data with Materialize
Operationalizing Pluralsight's Data with MaterializeOperationalizing Pluralsight's Data with Materialize
Operationalizing Pluralsight's Data with Materialize
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Streaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use CasesStreaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use Cases
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lake
 
963
963963
963
 
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
963
963963
963
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Ford
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Merlino, Imply

  • 1. A complete data platform for real-time analytics Apr 2021 Druid + Kafka Transform your data-in-motion to analytics-in-motion July 2021 © 2021 Imply Data, Inc.
  • 2. © 2021 Imply Data, Inc. Who am I? Gian Merlino Committer & PMC chair at Cofounder at (we’re hiring!)
  • 3. © 2021 Imply Data, Inc. Agenda What does it mean to set analytics in motion? Introduction to Apache Druid Learn how Druid + Kafka work together Try this at home
  • 4. © 2021 Imply Data, Inc. What does it mean to set analytics in motion?
  • 5. © 2021 Imply Data, Inc. Analytics-in-motion requirements Interactive queries Sub-second OLAP queries Highly scalable Massive data volumes, query concurrency, ingestion throughput Real-time Ingest in true real-time from Kafka Cost-effective Fastest queries at lowest cost
  • 6. © 2021 Imply Data, Inc. Analytics-in-motion applications
  • 7. © 2021 Imply Data, Inc. Intro to Apache Druid
  • 8. © 2021 Imply Data, Inc. Druid is a powerful real-time database =
  • 9. © 2021 Imply Data, Inc. Druid is a fancy calculator =
  • 10. © 2021 Imply Data, Inc. Druid in the wild
  • 11. © 2021 Imply Data, Inc. Some numbers from Imply customers Petabytes of data under management 1000s of daily active users Millions of rows/second ingested 90%ile query time < 1s 99%ile query time < 2s
  • 12. © 2021 Imply Data, Inc. Druid is designed for real-time performance Data sourced from: Correia, José & Costa, Carlos & Santos, Maribel. (2019). Challenging SQL-on-Hadoop Performance with Apache Druid.
  • 13. © 2021 Imply Data, Inc. Druid is designed for real-time performance Data layer Deep storage Query layer Streaming data is immediately queryable Hot, already-ingested data cached locally Filtered and aggregated data transferred to query layer Ingested data stored in indexed columnar format
  • 14. © 2021 Imply Data, Inc. How Druid + Kafka work together
  • 15. © 2021 Imply Data, Inc. Stream-only pattern Kafka Data sources Real-time processing Real-time delivery Kafka Streams ksqlDB Real-time database Presentation
  • 16. © 2021 Imply Data, Inc. Kafka + data lake pattern Data lake Kafka Data sources File-based storage Data lake Data lake Query engine Query engine Query engine 🔥 Analytics in motion Federated query engine (Presto, Hive, Spark, …) Real-time delivery Direct load from Kafka topic Backfill if needed
  • 17. © 2021 Imply Data, Inc. Demo!
  • 18. © 2021 Imply Data, Inc. Try this at home
  • 19. © 2021 Imply Data, Inc. Stay in touch 19 @druidio Join the community! https://druid.apache.org/community/ Free online training by Imply! https://imply.io/druid-university Follow the Druid project on Twitter!
  • 20. © 2021 Imply Data, Inc. Run with Imply 20 Available at https://imply.io/ Point and click deployment AWS, GCP, Kubernetes Bring-your-own VM / bare metal Batteries included: visualizations, dashboarding, alerts
  • 21. A complete data platform for real-time analytics Apr 2021 Thank you for attending July 2021 © 2021 Imply Data, Inc.