Apache Kafka is the de facto standard for real-time event streaming, but what do you do if you want to perform user-facing, ad-hoc, real-time analytics too? That's a hard problem. Apache Pinot solves it, and the two together are like chocolate and peanut butter, peaches and cream, and Steve Rogers and Peggy Carter.
Come to this talk for an introduction to Pinot and an overview of how the Pinot Kafka Connector works. Hear the challenges unique to a user-facing realtime analytics system, and how Pinot and Kafka work harmoniously to solve them. Witness an action-packed demo, showing just how easy it is to go from events to blazing-fast analytics, and how to use powerful features of both systems that help you do this at scale.
6. @apachepinot | @KishoreBytes
Who Viewed My Profile
Seunghyun Lee
Senior Software Engineer
LinkedIn
Chinmay Soman
Founding Engineer
Total users 700 Million+
QPS 1000s
Latency SLA < 100 ms p99th
Freshness Seconds
7. @apachepinot | @KishoreBytes
UberEats Restaurant Manager
● Identify surge in realtime
● Detect missed/inaccurate
orders in realtime
Total users 500000+
QPS 100s
Latency SLA < 100 ms p99th
Freshness Seconds - Minutes
8. @apachepinot | @KishoreBytes
Challenges for the underlying system
User-facing
Realtime
Analytics System
Large Volume &
Velocity of Data
Realtime
Ingestion
1000s of QPS
Milliseconds
Latency
Seconds
Freshness
High
Dimensionality
Scalable
9. @apachepinot | @KishoreBytes
Kafka: The perfect solution for the events capturing part
Velocity of
ingestion
Realtime
Ingestion
Seconds
Freshness
High
Dimensionality
Scalable
16. @apachepinot | @KishoreBytes
How to solve low-latency high-throughput analytics part?
Large Volume &
Velocity of Data
Realtime
Ingestion
1000s of QPS
Milliseconds
Latency
Seconds
Freshness
High
Dimensionality
Scalable
17. @apachepinot | @KishoreBytes
Need a specialized analytics database that can..
Ingest from
Kafka & serve
real-time data
Handle high
event rate from
Kafka and
scale with
Kafka
Provide ultra
low latency at
high queries
per second
Handle dynamic query
patterns on highly
dimensional data w/o
exploding storage
1 2 3 4
20. @apachepinot | @KishoreBytes
Apache Pinot Community
Slack Users
1100+
Companies
50+
Join our growing community
https://communityinviter.com/apps/apache-pinot/apache-pinot
Events/sec
1M+
Peak QPS
170K+
Query latency
ms
21. @apachepinot | @KishoreBytes
Need a specialized analytics database that can..
Ingest from
Kafka & serve
realtime data
Handle high
event rate from
Kafka and
scale with
Kafka
Provide ultra
low latency at
high queries
per second
Handle dynamic query
patterns on highly
dimensional data w/o
exploding storage
1 2 3 4
22. @apachepinot | @KishoreBytes
Apache Pinot Architecture
Pinot
Controller Zookeeper
Server 2
Server 1
Pinot Servers
Server 3
Pinot
Broker
Pinot
Brokers
Queries
Scatter - gather
Consuming,
indexing, serving
23. @apachepinot | @KishoreBytes
Server 3
Server 2
Server 1
p0 -> Server 1
p1 -> Server 2
p2 -> Server 3
p3 -> Server 1
Pinot
Broker
Pinot
Brokers
Pinot Servers
Pinot
Controller Zookeeper
Pinot Realtime Ingestion
Queries
Consuming,
indexing, serving
Partition -> Pinot Server
24. @apachepinot | @KishoreBytes
Server 3
Server 2
Server 1
p0 -> Server 1, CONSUMING
p1 -> Server 2, CONSUMING
p2 -> Server 3, CONSUMING
p3 -> Server 1, CONSUMING
Pinot
Broker
Pinot
Brokers
Pinot Servers
Pinot
Controller Zookeeper
Pinot Realtime Ingestion
Queries
Consuming,
indexing, serving
State
25. @apachepinot | @KishoreBytes
Server 3
Server 2
Server 1
p0 -> Server 1, CONSUMING, 102
p1 -> Server 2, CONSUMING, 120
p2 -> Server 3, CONSUMING, 105
p3 -> Server 1, CONSUMING, 100
Pinot
Broker
Pinot
Brokers
Pinot Servers
Pinot
Controller Zookeeper
Pinot Realtime Ingestion
Queries
Consuming,
indexing, serving
Start Offset
Kafka
Consumers
26. @apachepinot | @KishoreBytes
Server 3
Server 2
Server 1
p0 -> Server 1, DONE, 102, 300
p1 -> Server 2, CONSUMING, 120
p2 -> Server 3, CONSUMING, 105
p3 -> Server 1, CONSUMING, 100
Pinot
Broker
Pinot
Brokers
Pinot Servers
Pinot
Controller Zookeeper
Pinot Realtime Ingestion
Queries
Consuming,
indexing, serving
27. @apachepinot | @KishoreBytes
Server 3
Server 2
Server 1
p0 -> Server 1, DONE, 102, 300
p1 -> Server 2, CONSUMING, 120
p2 -> Server 3, CONSUMING, 105
p3 -> Server 1, CONSUMING, 100
p0 -> Server 1, CONSUMING, 300
Pinot
Broker
Pinot
Brokers
Pinot Servers
Pinot
Controller Zookeeper
Pinot Realtime Ingestion
Queries
Consuming,
indexing, serving
29. @apachepinot | @KishoreBytes
Need a specialized analytics database that can..
Ingest from
Kafka & serve
realtime data
Handle high
event rate from
Kafka and
scale with
Kafka
Provide ultra
low latency at
high queries
per second
Handle dynamic query
patterns on highly
dimensional data w/o
exploding storage
1 2 3 4
31. @apachepinot | @KishoreBytes
Need a specialized analytics database that can..
Ingest from
Kafka & serve
realtime data
Handle high
event rate from
Kafka and
scale with
Kafka
Provide ultra
low latency at
high queries
per second
Handle dynamic query
patterns on highly
dimensional data w/o
exploding storage
1 2 3 4
32. @apachepinot | @KishoreBytes
Star-tree Indexing
country browser device os clicks
us chrome ... ... ...
ca firefox ... ... ...
jp ie ... ... ...
us firefox ... ... ...
ca ie ... ... ...
… … ... ... ...
select count(*) from X
where country = us and
browser = chrome
Country
Browser
Star-Tree Index
US CA
IE C IE C
33. @apachepinot | @KishoreBytes
Need a specialized analytics database that can..
Ingest from
Kafka & serve
realtime data
Handle high
event rate from
Kafka and
scale with
Kafka
Provide ultra
low latency at
high queries
per second
Handle dynamic query
patterns on highly
dimensional data w/o
exploding storage
1 2 3 4
34. @apachepinot | @KishoreBytes
Horizontal scaling by adding new Kafka brokers + Pinot servers
Apps
Kafka
Cluster
Pinot
Cluster
Server
Brokers
Producers
Kafka Consumers
Events
37. @apachepinot | @KishoreBytes
User-facing Realtime Analytics System
Large Volume &
Velocity of Data
Realtime
Ingestion
1000s of QPS
Milliseconds
Latency
Seconds
Freshness
High
Dimensionality
Scalable
38. @apachepinot | @KishoreBytes
Takeaway
● Kafka is the key data infrastructure for event-streamed systems
● Kafka has good analytics solutions for operationalized streaming queries
● Pinot is purpose-built for ultra-low latency analytics, at high-throughput
● Pinot is a great solution for user-facing real-time analytics
● It is very easy to go from events in Kafka to analytics in Pinot
● Kafka + Pinot is the perfect combination for user-facing real-time analytics