SlideShare une entreprise Scribd logo
1  sur  21
Realtime Classroom Analytics
Powered By Apache Druid
Karthik Deivasigamani, Chief Architect, Noon - The Social Learning Platform
Agenda
● Who We Are
● Live Online Classroom
● Quality Of Experience
● Why Apache Druid
● Realtime Classroom Monitoring
● Key Lessons
● Q & A
Who Are We?
Noon has evolved into a ‘Social Learning’
platform three years ago to craft the most
engaging learning experience.
● Our mission is to radically change the
way people learn.
● Make learning more social and fun.
● 10M+ users from over 5 countries
● 1M+ MAU with 50+ mins per active day
per student
Live Online Classroom
Students spend a significant amount of their
time on Noon learning from their teacher
within the online classrooms.
Classroom Features
● Video, Audio, Chat and Whiteboard
● Breakouts, Raise Hand
● Peak 10K students / session
Live Classroom - Challenges
Audio
Voice is broken
● Teacher’s uplink quality
● Issues with microphone
● Student’s downlink
quality
● ISP policies
Whiteboard
Lag in whiteboard
● Loss of drawing events
due to unstable network
● Heavy CPU usage on the
mobile device
● Software Bug
Quality Of Experience
“Quality of experience is a measure
of the delight or annoyance of a
customer's experiences with a
service.” - Wikipedia
Monitoring The Classroom
Metrics
● Uplink/Downlink Network Quality
● Packet Loss
● Remote/Local Audio Quality
● Mic Status
● Jitter Buffer Delay
● frameFrozenRate
● Uplink/Downlink BitRate
Dimensions
● Country
● Region
● City
● Session
● User
● ISP
● Network Type
Aggregations
● Percentile
● Count
● Average
● Distinct Count
● Standard Deviation
System Characteristics
● Real Time Ingestion
● Scale Horizontally
● High Cardinality Data
● Subsecond Query Latency
● Fast Aggregation
● Zoom In & Zoom Out
● Highly Available
Why Apache Druid
● Real Time Ingestion From Kafka Through Spec Files
● Data & Query Nodes Allows For Horizontal Scaling
● Sketches For High Cardinality Columns
● Low-Latency Querying
● Rich Built In Capabilities For Exact & Approx Aggregation
● Data Rollups
● Fault Tolerance At Multiple Levels
Data Collection - Network & Audio
WebRTC Stats
Sent BitRate
Received BitRate
Audio Packet Loss
Audio Level
Bytes Sent/Received
Audio Frame Freeze Rate
Network Quality
Audio Quality
Data Collection - Whiteboard
Whiteboard Stats
Stroke Difference
Drift Percentage
Ingestion
● All ingestions happen via Kafka in real
time
● Flink Topology
● Split & Format to conform with
ingestion spec
● Rollup Enabled At Ingestion Time
● Conditional transformation
● Looking forward to using Lag Based
AutoScaler.
Making Ingestion Easy
● Well defined event (ProtoBuf) schema
serialized as JSON.
● Jsonpath based DSL defining
transformers & ingestion spec.
● Parsing & Transformation based on
the configuration file in a flink
topology.
● Ingestion Spec Auto Generated from
JSON configuration file.
● Automated Deployments Via Jenkins
Schema Design
● Always start from your use-cases.
● Identify Dimensions & Metrics
● Aggregations & Approximation (hyperloglog,
quantiles sketches)
● Query Granularity
● Partitions
● Deep Storage
● Data Retention
Self Serve Dashboard - Zoom Out & Zoom In
Country Level View
Sessions Inside A
Country
Session Level View
Students Inside A
Session View
Student Session
Level View
Our Druid Cluster
Topology
● Master (m5.2xl)
● Data Node (i3.2xl)
○ Tiered
○ 24 slots
● Query Node (m5.2xl)
● External ZK, MySQL, S3
Deep Storage
Monitoring Numbers
● Datadog-Druid
● System Resources
● Ingestion Lag
● Number of Segments
● Query Time
● JVM Memory Usage
● 15+ dims, 50+ metrics
● 105 M events per day
● 2B rows @ Avg Row Size
1K
● 4k-5k Segment
● p90 latency ~ 850 ms
Putting Together
Business Impact
● Quickly Identify Problems
● Validation of fixes put in to improve quality
● Self Serve Tool, reducing burden on
developers
● Improved transparency & trust between
OPS and developers
● Student NPS score improved
Challenges & Key Lessons
● Rollups are your best friend
● Ingestion Time Transformation > Query Time
Transformation
● Approximation - Hyperloglog, Data Sketches
● Late Arrival Of Messages & Compaction
● Query Performance depends on your data model
● Setup takes time to stabilize.
● druid-user group is super helpful!
Questions?
Thank you
Contact: karthik@noonacademy.com

Contenu connexe

Tendances

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer GuideDeon Huang
 

Tendances (20)

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Nifi
NifiNifi
Nifi
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
kafka
kafkakafka
kafka
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 

Similaire à Realtime classroom analytics powered by apache druid

(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The CloudAmazon Web Services
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudJosh Evans
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, CriteoParis Open Source Summit
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedis Labs
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microserviceChandresh Pancholi
 
Design patterns for scaling web applications
Design patterns for scaling web applicationsDesign patterns for scaling web applications
Design patterns for scaling web applicationsIvan Dimitrov
 
Java Based RFID Attendance Management System Graduation Project Presentation
Java Based RFID Attendance Management System Graduation Project PresentationJava Based RFID Attendance Management System Graduation Project Presentation
Java Based RFID Attendance Management System Graduation Project PresentationIbrahim Abdel Fattah Mohamed
 
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...Precisely
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@GrabShubham Tagra
 
Real-time applications with sockets and websockets. Introduction to Smartfoxs...
Real-time applications with sockets and websockets. Introduction to Smartfoxs...Real-time applications with sockets and websockets. Introduction to Smartfoxs...
Real-time applications with sockets and websockets. Introduction to Smartfoxs...Pablo Monterde Perez
 
Improving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time SparkImproving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time Sparkdatamantra
 

Similaire à Realtime classroom analytics powered by apache druid (20)

(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the Cloud
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
demo
demo demo
demo
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microservice
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
Design patterns for scaling web applications
Design patterns for scaling web applicationsDesign patterns for scaling web applications
Design patterns for scaling web applications
 
Java Based RFID Attendance Management System Graduation Project Presentation
Java Based RFID Attendance Management System Graduation Project PresentationJava Based RFID Attendance Management System Graduation Project Presentation
Java Based RFID Attendance Management System Graduation Project Presentation
 
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 
Vedantu @ Kranky Geek
Vedantu @ Kranky GeekVedantu @ Kranky Geek
Vedantu @ Kranky Geek
 
Real-time applications with sockets and websockets. Introduction to Smartfoxs...
Real-time applications with sockets and websockets. Introduction to Smartfoxs...Real-time applications with sockets and websockets. Introduction to Smartfoxs...
Real-time applications with sockets and websockets. Introduction to Smartfoxs...
 
Improving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time SparkImproving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time Spark
 

Dernier

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Dernier (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Realtime classroom analytics powered by apache druid

  • 1. Realtime Classroom Analytics Powered By Apache Druid Karthik Deivasigamani, Chief Architect, Noon - The Social Learning Platform
  • 2. Agenda ● Who We Are ● Live Online Classroom ● Quality Of Experience ● Why Apache Druid ● Realtime Classroom Monitoring ● Key Lessons ● Q & A
  • 3. Who Are We? Noon has evolved into a ‘Social Learning’ platform three years ago to craft the most engaging learning experience. ● Our mission is to radically change the way people learn. ● Make learning more social and fun. ● 10M+ users from over 5 countries ● 1M+ MAU with 50+ mins per active day per student
  • 4. Live Online Classroom Students spend a significant amount of their time on Noon learning from their teacher within the online classrooms. Classroom Features ● Video, Audio, Chat and Whiteboard ● Breakouts, Raise Hand ● Peak 10K students / session
  • 5. Live Classroom - Challenges Audio Voice is broken ● Teacher’s uplink quality ● Issues with microphone ● Student’s downlink quality ● ISP policies Whiteboard Lag in whiteboard ● Loss of drawing events due to unstable network ● Heavy CPU usage on the mobile device ● Software Bug
  • 6. Quality Of Experience “Quality of experience is a measure of the delight or annoyance of a customer's experiences with a service.” - Wikipedia
  • 7. Monitoring The Classroom Metrics ● Uplink/Downlink Network Quality ● Packet Loss ● Remote/Local Audio Quality ● Mic Status ● Jitter Buffer Delay ● frameFrozenRate ● Uplink/Downlink BitRate Dimensions ● Country ● Region ● City ● Session ● User ● ISP ● Network Type Aggregations ● Percentile ● Count ● Average ● Distinct Count ● Standard Deviation
  • 8. System Characteristics ● Real Time Ingestion ● Scale Horizontally ● High Cardinality Data ● Subsecond Query Latency ● Fast Aggregation ● Zoom In & Zoom Out ● Highly Available
  • 9. Why Apache Druid ● Real Time Ingestion From Kafka Through Spec Files ● Data & Query Nodes Allows For Horizontal Scaling ● Sketches For High Cardinality Columns ● Low-Latency Querying ● Rich Built In Capabilities For Exact & Approx Aggregation ● Data Rollups ● Fault Tolerance At Multiple Levels
  • 10. Data Collection - Network & Audio WebRTC Stats Sent BitRate Received BitRate Audio Packet Loss Audio Level Bytes Sent/Received Audio Frame Freeze Rate Network Quality Audio Quality
  • 11. Data Collection - Whiteboard Whiteboard Stats Stroke Difference Drift Percentage
  • 12. Ingestion ● All ingestions happen via Kafka in real time ● Flink Topology ● Split & Format to conform with ingestion spec ● Rollup Enabled At Ingestion Time ● Conditional transformation ● Looking forward to using Lag Based AutoScaler.
  • 13. Making Ingestion Easy ● Well defined event (ProtoBuf) schema serialized as JSON. ● Jsonpath based DSL defining transformers & ingestion spec. ● Parsing & Transformation based on the configuration file in a flink topology. ● Ingestion Spec Auto Generated from JSON configuration file. ● Automated Deployments Via Jenkins
  • 14. Schema Design ● Always start from your use-cases. ● Identify Dimensions & Metrics ● Aggregations & Approximation (hyperloglog, quantiles sketches) ● Query Granularity ● Partitions ● Deep Storage ● Data Retention
  • 15. Self Serve Dashboard - Zoom Out & Zoom In Country Level View Sessions Inside A Country Session Level View Students Inside A Session View Student Session Level View
  • 16. Our Druid Cluster Topology ● Master (m5.2xl) ● Data Node (i3.2xl) ○ Tiered ○ 24 slots ● Query Node (m5.2xl) ● External ZK, MySQL, S3 Deep Storage Monitoring Numbers ● Datadog-Druid ● System Resources ● Ingestion Lag ● Number of Segments ● Query Time ● JVM Memory Usage ● 15+ dims, 50+ metrics ● 105 M events per day ● 2B rows @ Avg Row Size 1K ● 4k-5k Segment ● p90 latency ~ 850 ms
  • 18. Business Impact ● Quickly Identify Problems ● Validation of fixes put in to improve quality ● Self Serve Tool, reducing burden on developers ● Improved transparency & trust between OPS and developers ● Student NPS score improved
  • 19. Challenges & Key Lessons ● Rollups are your best friend ● Ingestion Time Transformation > Query Time Transformation ● Approximation - Hyperloglog, Data Sketches ● Late Arrival Of Messages & Compaction ● Query Performance depends on your data model ● Setup takes time to stabilize. ● druid-user group is super helpful!