In this meetup, we will introduce the concepts of Real Time Analytics, why it is important, the evolution of Analytics, and how companies such as LinkedIn, Stripe, Uber and more are using Real Time analytics to grow their audience and improve usability by using Apache Pinot. What is Apache Pinot? Followed by Demo and Q&A.
2. Agenda
Who is Barkha? (why would you want to listen to me?)
The evolution of Analytics
How LinkedIn Solved their Problem
Try some Pinot with me
3. Overheard @ Big Data Fest 2023
• 5 Year trends in Big data will see
• Streaming APIs
• Will Data Warehouse Survive?
• Integration with LLM/AI/ML
• Thiago de Faria
• 5 Year trends in Big data will see
• Democratization of Data Warehousing
• Commoditized Data Warehousing
• Most companies are barely doing BI let alone AI.
• Joe Reis
4. About Barkha
• Founder South Florida Women in
Technology
• Developer Advocate @StarTree
• Linkedin.com/in/BarkhaHerman
• Twitter @BarkhaH
7. Modern
Analytics
Data Freshness
Daily reports vs.
How late is my food
delivery?
Query Performance
Reports < 2 minutes vs.
Dashboards take < 10
millisecond to load
Scale
All division managers
worldwide access report
(> 1000) vs.
Millions of users access
dashboard
9. LinkedIn: Who Viewed
your Profile? • Capture profile view information
and its deduplication
• Compute view sources (e.g.,
search, profile page, etc.)
• View relevance (e.g., a senior
leader viewed your profile)
• View obfuscations based on the
viewing member’s privacy settings
10. Before Pinot
• Elastic Search based solution
• 1000 Nodes
• 1500 queries / sec
• 20+ million users
12. Pinot
Building
Blocks • Segment is the physical
store.
• Table are conceptual and
accept both real-time and
batch data.
• Tenants provide
functional segregation.
• Cluster allow for scale
based on use.
14. Indexes
Pinot
supports
the
following
indexing
techniques
Inverted index - Used for exact lookups
Range index - Used for range queries.
Text index - Used for phrase, term, Boolean, prefix, or regex queries.
Geospatial index - Based on H3, a hexagon-based hierarchical gridding.
Used for finding points that exist within a certain distance from another point.
JSON index - Used for querying columns in JSON documents.
Star-Tree index - Pre-aggregates results across multiple columns.
18. Overheard @ Big Data Fest 2023
• 5 Year trends in Big data will see
• Streaming APIs Apache Pinot is built to solve Streaming First Problems
• Will Data Warehouse Survive? Apache Pinot builds Customer Facing Analytics which is on the rise
• Integration with LLM/AI/ML Apps built on top of Pinot such as ThirdEye use Statistics and allow for AI/ML Add Ons.
• Thiago de Faria
• 5 Year trends in Big data will see
• Democratization of Data Warehousing Apache Pinot builds Customer Facing Analytics which is on the rise
• Commoditized Data Warehousing Apache Pinot builds Customer Facing Analytics which is on the rise
• Most companies are barely doing BI let alone AI. Easy Analytics + Apps built on top of Pinot such as ThirdEye.
• Joe Reis