SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Sub-Second SQL Search, Aggregations
and Joins with Kafka and Rockset
Dhruba Borthakur / CTO, Rockset
Presenter
Dhruba Borthakur
Co-Founder & CTO
2
Unlocking Value from Event Streams
Event Streams
● Online advertising
● Web clicks
● Online gaming interactions
● Online purchases and bookings
● Financial transactions
● IoT - sensor data
3
Applications
● Real-time customer 360
● Real-time personalization
● Logistics tracking
● Security analytics
● Operational analytics
ETL
The Need for Real-Time Analytics
4
Past
Present
Event streams Data lake Data warehouse Offline reporting
Event streams Data lake Data warehouse
ETL
Offline reporting
Real-time
database
Real-time data
applications
Apache Kafka and Real-Time Analytics
5
● Apache Kafka is a foundational platform for
real-time analytics
○ Central location for collecting event
data and making it available in real time
○ Low latency and high write throughput
○ Queue: First-in, first-out
Source: https://kafka.apache.org/powered-by
Rockset and Real-Time Analytics
6
Real-time indexing database
for modern data applications
at massive scale
without operational overhead
How Kafka and Rockset Work Together
7
Events from apps,
devices, sensors
KSQL
Enrichment
Real-time analytics
applications
OLTP database or data
lake
SQL, REST
An indexing database to serve
queries from Kafka data
Query Latency
● Ad-hoc queries and drilldowns in real-time
● Millisecond-latency queries to support live dashboards and data APIs
● How to get achieve low-latency queries?
9
Optimize Query Latency by Indexing
Traditional approach:
Parallelize and scan
10
event data MapReduce reports event data Converged
Indexing
ad hoc
analytics
Real-time analytics:
Parallelize and index
Column store Column, Inverted and Row store
● All fields are indexed in inverted, columnar and row indexes
● Accelerates search, aggregation and join queries
● No index definition required
Converged Index
<doc 0>
{
“name”: “Igor”
}
<doc 1>
{
“name”: “Dhruba”
}
Key Value
R.0.name Igor Row Store
R.1.name Dhruba
C.name.0 Igor Column Store
C.name.1 Dhruba
S.name.Dhruba.1 Search index
S.name.Igor.0
11
12
Query Optimizer
● Low latency for both highly selective queries and large scans
● Optimizer picks between
○ inverted index (Index Filter operator)
○ columnar format (Column Scan operator)
○ inverted index (Index Scan operator)
Advantages of indexing Kafka data
Complex Queries
● Support for expressive query
language
● Ability to perform joins,
aggregations, sorting, filtering, etc.
14
Read-Time JOINs
● Streams are most useful when joined with other data
15
Streaming
event data
Query
Analytics backend
Other Data Sources
(e.g. Amazon S3)
Flexibility with Data and Schema
● Allow values of different types in the same column
● Ability to ingest new data without needing data cleaning at write time
○ Avoid flattening or denormalization for performance reasons
● Type binding not done at write time (but done later at query time)
16
Making indexing scale
Disaggregated, Cloud-Native Architecture
18
Aggregator Leaf Tailer Architecture
19
Scaling Query Compute
Aggregator Leaf Tailer Architecture
Continuous Ingestion and Indexing from Kafka
● Fast ingestion
○ New data is visible in query results in seconds
○ Complex ETL processes can add minutes to hours before the data is
available to query
● Live sync
○ Continuous sync of new data from Kafka
20
live
sync
within seconds
Ingest Rollups
● SQL rollups and transformations
○ Pre-aggregate data at ingest time to increase performance and reduce size
○ Familiar SQL syntax
21
Demo
Learn More
23
Request demo or get started for free at rockset.com
Reach out at dhruba@rockset,com
Thank you
Dhruba Borthakur / dhruba@rockset.com
25
● Fields are dynamically typed
Strong Dynamic Typing
26
● Fields are dynamically typed
● Queries are strongly typed
Strong Dynamic Typing
27
● Fields are dynamically typed
● Queries are strongly typed
● Smart schemas
Strong Dynamic Typing

Contenu connexe

Tendances

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
confluent
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
HostedbyConfluent
 

Tendances (20)

You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
 
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
 
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...
 
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projects
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
 
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of KafkaKafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
 

Similaire à Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba Borthakur, Rockset

From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 

Similaire à Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba Borthakur, Rockset (20)

From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
AmazonRedshift
AmazonRedshiftAmazonRedshift
AmazonRedshift
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Building a reliable and cost effect logging system at Box
Building a reliable and cost effect logging system at Box Building a reliable and cost effect logging system at Box
Building a reliable and cost effect logging system at Box
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 

Plus de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Plus de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba Borthakur, Rockset

  • 1. Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset Dhruba Borthakur / CTO, Rockset
  • 3. Unlocking Value from Event Streams Event Streams ● Online advertising ● Web clicks ● Online gaming interactions ● Online purchases and bookings ● Financial transactions ● IoT - sensor data 3 Applications ● Real-time customer 360 ● Real-time personalization ● Logistics tracking ● Security analytics ● Operational analytics
  • 4. ETL The Need for Real-Time Analytics 4 Past Present Event streams Data lake Data warehouse Offline reporting Event streams Data lake Data warehouse ETL Offline reporting Real-time database Real-time data applications
  • 5. Apache Kafka and Real-Time Analytics 5 ● Apache Kafka is a foundational platform for real-time analytics ○ Central location for collecting event data and making it available in real time ○ Low latency and high write throughput ○ Queue: First-in, first-out Source: https://kafka.apache.org/powered-by
  • 6. Rockset and Real-Time Analytics 6 Real-time indexing database for modern data applications at massive scale without operational overhead
  • 7. How Kafka and Rockset Work Together 7 Events from apps, devices, sensors KSQL Enrichment Real-time analytics applications OLTP database or data lake SQL, REST
  • 8. An indexing database to serve queries from Kafka data
  • 9. Query Latency ● Ad-hoc queries and drilldowns in real-time ● Millisecond-latency queries to support live dashboards and data APIs ● How to get achieve low-latency queries? 9
  • 10. Optimize Query Latency by Indexing Traditional approach: Parallelize and scan 10 event data MapReduce reports event data Converged Indexing ad hoc analytics Real-time analytics: Parallelize and index Column store Column, Inverted and Row store
  • 11. ● All fields are indexed in inverted, columnar and row indexes ● Accelerates search, aggregation and join queries ● No index definition required Converged Index <doc 0> { “name”: “Igor” } <doc 1> { “name”: “Dhruba” } Key Value R.0.name Igor Row Store R.1.name Dhruba C.name.0 Igor Column Store C.name.1 Dhruba S.name.Dhruba.1 Search index S.name.Igor.0 11
  • 12. 12 Query Optimizer ● Low latency for both highly selective queries and large scans ● Optimizer picks between ○ inverted index (Index Filter operator) ○ columnar format (Column Scan operator) ○ inverted index (Index Scan operator)
  • 14. Complex Queries ● Support for expressive query language ● Ability to perform joins, aggregations, sorting, filtering, etc. 14
  • 15. Read-Time JOINs ● Streams are most useful when joined with other data 15 Streaming event data Query Analytics backend Other Data Sources (e.g. Amazon S3)
  • 16. Flexibility with Data and Schema ● Allow values of different types in the same column ● Ability to ingest new data without needing data cleaning at write time ○ Avoid flattening or denormalization for performance reasons ● Type binding not done at write time (but done later at query time) 16
  • 19. 19 Scaling Query Compute Aggregator Leaf Tailer Architecture
  • 20. Continuous Ingestion and Indexing from Kafka ● Fast ingestion ○ New data is visible in query results in seconds ○ Complex ETL processes can add minutes to hours before the data is available to query ● Live sync ○ Continuous sync of new data from Kafka 20 live sync within seconds
  • 21. Ingest Rollups ● SQL rollups and transformations ○ Pre-aggregate data at ingest time to increase performance and reduce size ○ Familiar SQL syntax 21
  • 22. Demo
  • 23. Learn More 23 Request demo or get started for free at rockset.com Reach out at dhruba@rockset,com
  • 24. Thank you Dhruba Borthakur / dhruba@rockset.com
  • 25. 25 ● Fields are dynamically typed Strong Dynamic Typing
  • 26. 26 ● Fields are dynamically typed ● Queries are strongly typed Strong Dynamic Typing
  • 27. 27 ● Fields are dynamically typed ● Queries are strongly typed ● Smart schemas Strong Dynamic Typing