SlideShare une entreprise Scribd logo
Druid Ingestion: From 3 hr to 5 min
Shivji kumar Jha, Staff Engineer, Nutanix
Sachidananda Maharana, MTS 4, Nutanix
Challenges, Mitigations & Learnings
About Us
Shivji Kumar Jha
Staff Engineer,
CPaaS Data Platform,
Nutanix
Sachidananda Maharana
Sr Engineer / OLAP Ninja
CPaaS Team, Nutanix
 Software Engineer & Regular Speaker / Meetups
 Excited about:
 Distributed Databases & Streaming
 Open-Source Software & Communities
 MySQL, Postgres, Pulsar/NATS, Druid/Clickhouse
 Regular Platform Engineer
 Excited about:
 Distributed OLAP Databases
 Open-Source Enthusiast
Contents
Druid 101
How we use Druid
Re-architecture : What & Why
Impact On Druid components
How we fixed the issues
State of Bugs we filed / fixed
Druid 101
• Open-source, Apache 2.0 License and under Apache Foundation
• Columnar data store designed for high-performance.
• Supports Real-time and Batch ingestion.
• Segment Oriented Storage
• Distributed and modular architecture, horizontally scalable for most
parts
• Supports Data tiering – Keep cold data in cheaper storage!
What we love about Druid!
Modularity - Separation of Concerns
Modularity – Simplicity* : Ease to deploy , Upgrade, Migrate, Manage
Modularity – Flexibility - Scale only what you need, Retain based on retention rules on tiers
Modularity - Built for Cloud
Durability – Object Store (S3 or Nutanix Objects for instance) for Deep Storage
Durability - SQL database for metadata
Admin Dashboard – easier debugging and monitoring
Write
Read
Druid 101
Ingestion & Query Patterns
● IPFix log files are collected from clouds.
○ IPFIX : IP Flow Information Export
○ Summarizes network data packets to track IP actions
● We enrich data and store in an s3 bucket.
● S3 data is ingested into druid.
● Serves Analytics dashboards in slice and dice manner.
● Used for ML engine as well.
Druid Nos : 3+ years in Prod
Last 24 hrs
Cluster Size
Data Model for our Apps
● Analytics Apps as part of Nutanix Dashboard
● Customers can slice and dice data given some filters
● Multi-tenant Use Case
● Druid Data source per customer per use case
● Enable features for some data sources
○ Phased rollout for new Druid features
○ Druid Version Upgrades
○ App redesign requiring Change in Druid ingestion or query.
● Workflow engine (Temporal) for pipeline.
● Java based Workers backed by Postgres storage for state.
Change in Requirements
● Change in Requirement: Batch (3 hours) to 5 minutes
● Earlier:
○ Agent collects data, dumps to S3.
○ Cron runs every 3 hour, ingests from S3 to Druid
○ SLA : 3 hours
● New Design:
○ SLA : 15 minutes
○ Agent collects data, dumps to S3 every 5 minutes.
○ Ingestion Pipeline ingests to Druid depending on what Druid likes.
○ Ingestion Pipeline gobbles backpressure.
● Release Plan
○ Data sources uploaded to cluster in a phased manner
Before: old batch system
Cron : 3 hrs
Change: Batch to near-real-time system
nudge State
Machine,
absorb
backpressure
Cron : 5 mins
Batch to near-real-time system
Cron : 5 mins
Druid
Ingestion
Tasks
Druid Database
Druid Database
Datasource 1
Druid Database
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Proof of the Pudding !
Proof of the Pudding(2) !
Proof of the Pudding(3) !
Summary: When Druid was struggling (Overlord on
fire)
● Ingested smaller, but more tasks.
● onboarding a few large datasources, fine for a day
● More confidence 
● Onboarded all datasource at once
○ Task queue kept increasing (till 25K). Overlord overwhelmed after 5K
○ Soon, overlord machine CPU usage at 100%
● All the tasks were stuck in pending state
● Task count was 12x more than previous but smaller.
● Middle managers were sitting idle, no incoming tasks.
● Task state were not updating properly as overlord was overwhelmed.
Druid Overlord
Getting Overlord Alive
Druid Database
Overlord Process
Druid Database
Overlord Process
Bigger VM
Druid Database
Overlord Process
Bigger DB Instance
Bigger VM
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs:
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs: No
ZK for
assignment
Druid.indexer.runner.type : httpRemote
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs:
Throttle,
Don’t give up
Druid.indexer.runner.type : httpRemote
Druid.indexer.queue.maxSize : 5000
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs:
● Set max pending tasks per datasource for an interval to 1
Throttle,
Don’t give up
Druid.indexer.runner.type : httpRemote
Druid.indexer.queue.maxSize : 5000
GET /druid/indexer/v1/pendingTasks?datasource=ds1
Filed Github issues so you don’t hit these…
Fixed Issues (PR)
Making DB functional…
● Queries from overlord to Postgres for
task metadata were taking long time.
● Add more CPU to DB server
● Improvements:
○ Overlord CPU utilization is less
○ Number of pending tasks are less
○ Task slot utilization graph looks stable
Scaling Middle Managers
Druid Database
Peon Processes
Middle managers
Druid Database
Middle managers
Peon Processes
More VMs
Druid Database
Middle managers
Peon Processes
More VMs
Bigger Compact tasks
Tiering Middle Managers
Druid Database
More slots per Middle Managers
More VMs
More slots
Bigger Compact tasks
Druid Database
Right size Middle Managers
Less VMs
More slots
Bigger Compact tasks
Summary : Scaling Middle Manager
● Increased number of middle manager as so
that more task slots are available for
overlord to assign tasks.
● Then we increased number of slots per
middle manager as new tasks were small
i.e. having less number of files to ingest.
● We created a separate tier for compaction
as these tasks took more resource then the
current index tasks.
● Then we right sized the middle manager
count in each tier by reducing it.
12 MMs * 5 slots => 24MMs * 5 slots
24 MMs * 5 slots => 12MMs * 10 slots
12 MMs * 10 slots =>
10 MMs * 10 slots + 2 MMs *5 slots
Tiering
Coordinator Issues
Coordinator Issues
Coordinator Issues
Coordinator Issues
Summary of Coordinator crisis…
● Happy Overlord.
● But issues in Coordinator now:
○ Huge number of small segments.
○ Unavailable segments count increasing.
○ Coordinator CPU usage increasing
○ Coordinator cycle is taking too long to complete
Fixing Coordinator
Druid Database
Handling the Coordinator
Coordinator Process
Druid Database
Handling the Coordinator
Coordinator Process
Bigger VM
Druid Database
Handling the Coordinator
Coordinator Process
Bigger VM
Same big DB
Handling the Coordinator…
● Increased Coordinator instance type as it is not scalable
horizontally
● Tried the following coordinator dynamic configs:
Handling the Coordinator…
● Increased Coordinator instance type as it is not scalable
horizontally
● Tried few coordinator dynamic configs:
maxSegmentsToMove: 1000
percentOfSegmentsToConsiderPerMove: 25
reducing the
number of
segments per
coordinator
cycle
Handling the Coordinator…
● Increased Coordinator instance type as it is not scalable
horizontally
● Tried few coordinator dynamic configs:
maxSegmentsToMove: 1000
percentOfSegmentsToConsiderPerMove: 25
Assign segments
In round-robin
fashion first.
Lazily reassign with
chosen balancer
strategy later
useRoundRobinSegmentAssignment: true
Handling the Coordinator…
● We saw this error in coordinator logs during
auto compaction for many datasources.
“is larger than inputSegmentSize[2147483648]”
● Removing this setting from auto compaction
config resolved the issue.
● This is no longer an issue Druid 25 onwards.
inputSegmentSizeBytes: 100TB
Handling the Historicals
● Until auto compaction done:
○ More no of segments for queries
○ More processing power for historicals
● Cold data has HIGHER segment
granularity
○ Compaction Done!
● Hot data has LOWER segment
granularity
○ Compaction NOT done YET!
Query for
recent data
Query for
recent data
Older Historicals
Current Historicals
Larger segments
Smaller segments
Datasource 2
Datasource 1
Datasource 1
Datasource 2
Druid Database
Happy State!!!
Summary
● Once we stabilized Druid Ingestion and query both pipelines we
onboarded all customers in a phased manner.
● Set the optimal queue size.
● To absorb the initial burst of tasks we increased MM count.
● Right size Overlord and coordinator once the onboarding was
complete
● Do know overlord and coordinator settings well.
Thank You
Questions?
Shivji Kumar Jha
linkedin.com/in/shivjijha/
slideshare.net/shiv4289/presentations/
youtube.com/@shivjikumarjha
Sachidananda Maharana
https://www.linkedin.com/in/sachidanandamaharana/

Contenu connexe

Similaire à Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes

Similaire à Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes (20)

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
What makes me to migrate entire VPC JAWS PANKRATION 2021
What makes me to migrate entire VPC JAWS PANKRATION 2021What makes me to migrate entire VPC JAWS PANKRATION 2021
What makes me to migrate entire VPC JAWS PANKRATION 2021
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 
[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacenters[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacenters
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Running Dataproc At Scale in production - Searce Talk at GDG Delhi
Running Dataproc At Scale in production - Searce Talk at GDG DelhiRunning Dataproc At Scale in production - Searce Talk at GDG Delhi
Running Dataproc At Scale in production - Searce Talk at GDG Delhi
 
#lspe Building a Monitoring Framework using DTrace and MongoDB
#lspe Building a Monitoring Framework using DTrace and MongoDB#lspe Building a Monitoring Framework using DTrace and MongoDB
#lspe Building a Monitoring Framework using DTrace and MongoDB
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 

Plus de Shivji Kumar Jha

Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Shivji Kumar Jha
 

Plus de Shivji Kumar Jha (20)

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded Slaves
 

Dernier

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Dernier (20)

How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes

  • 1. Druid Ingestion: From 3 hr to 5 min Shivji kumar Jha, Staff Engineer, Nutanix Sachidananda Maharana, MTS 4, Nutanix Challenges, Mitigations & Learnings
  • 2.
  • 3. About Us Shivji Kumar Jha Staff Engineer, CPaaS Data Platform, Nutanix Sachidananda Maharana Sr Engineer / OLAP Ninja CPaaS Team, Nutanix  Software Engineer & Regular Speaker / Meetups  Excited about:  Distributed Databases & Streaming  Open-Source Software & Communities  MySQL, Postgres, Pulsar/NATS, Druid/Clickhouse  Regular Platform Engineer  Excited about:  Distributed OLAP Databases  Open-Source Enthusiast
  • 4. Contents Druid 101 How we use Druid Re-architecture : What & Why Impact On Druid components How we fixed the issues State of Bugs we filed / fixed
  • 5. Druid 101 • Open-source, Apache 2.0 License and under Apache Foundation • Columnar data store designed for high-performance. • Supports Real-time and Batch ingestion. • Segment Oriented Storage • Distributed and modular architecture, horizontally scalable for most parts • Supports Data tiering – Keep cold data in cheaper storage!
  • 6. What we love about Druid! Modularity - Separation of Concerns Modularity – Simplicity* : Ease to deploy , Upgrade, Migrate, Manage Modularity – Flexibility - Scale only what you need, Retain based on retention rules on tiers Modularity - Built for Cloud Durability – Object Store (S3 or Nutanix Objects for instance) for Deep Storage Durability - SQL database for metadata Admin Dashboard – easier debugging and monitoring
  • 8. Ingestion & Query Patterns ● IPFix log files are collected from clouds. ○ IPFIX : IP Flow Information Export ○ Summarizes network data packets to track IP actions ● We enrich data and store in an s3 bucket. ● S3 data is ingested into druid. ● Serves Analytics dashboards in slice and dice manner. ● Used for ML engine as well.
  • 9. Druid Nos : 3+ years in Prod Last 24 hrs Cluster Size
  • 10.
  • 11. Data Model for our Apps ● Analytics Apps as part of Nutanix Dashboard ● Customers can slice and dice data given some filters ● Multi-tenant Use Case ● Druid Data source per customer per use case ● Enable features for some data sources ○ Phased rollout for new Druid features ○ Druid Version Upgrades ○ App redesign requiring Change in Druid ingestion or query. ● Workflow engine (Temporal) for pipeline. ● Java based Workers backed by Postgres storage for state.
  • 12. Change in Requirements ● Change in Requirement: Batch (3 hours) to 5 minutes ● Earlier: ○ Agent collects data, dumps to S3. ○ Cron runs every 3 hour, ingests from S3 to Druid ○ SLA : 3 hours ● New Design: ○ SLA : 15 minutes ○ Agent collects data, dumps to S3 every 5 minutes. ○ Ingestion Pipeline ingests to Druid depending on what Druid likes. ○ Ingestion Pipeline gobbles backpressure. ● Release Plan ○ Data sources uploaded to cluster in a phased manner
  • 13. Before: old batch system Cron : 3 hrs
  • 14. Change: Batch to near-real-time system nudge State Machine, absorb backpressure Cron : 5 mins
  • 15. Batch to near-real-time system Cron : 5 mins Druid Ingestion Tasks
  • 19. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 20. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 21. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 22. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 23. Proof of the Pudding !
  • 24. Proof of the Pudding(2) !
  • 25. Proof of the Pudding(3) !
  • 26. Summary: When Druid was struggling (Overlord on fire) ● Ingested smaller, but more tasks. ● onboarding a few large datasources, fine for a day ● More confidence  ● Onboarded all datasource at once ○ Task queue kept increasing (till 25K). Overlord overwhelmed after 5K ○ Soon, overlord machine CPU usage at 100% ● All the tasks were stuck in pending state ● Task count was 12x more than previous but smaller. ● Middle managers were sitting idle, no incoming tasks. ● Task state were not updating properly as overlord was overwhelmed. Druid Overlord
  • 30. Druid Database Overlord Process Bigger DB Instance Bigger VM
  • 31. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs:
  • 32. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs: No ZK for assignment Druid.indexer.runner.type : httpRemote
  • 33. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs: Throttle, Don’t give up Druid.indexer.runner.type : httpRemote Druid.indexer.queue.maxSize : 5000
  • 34. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs: ● Set max pending tasks per datasource for an interval to 1 Throttle, Don’t give up Druid.indexer.runner.type : httpRemote Druid.indexer.queue.maxSize : 5000 GET /druid/indexer/v1/pendingTasks?datasource=ds1
  • 35. Filed Github issues so you don’t hit these…
  • 37. Making DB functional… ● Queries from overlord to Postgres for task metadata were taking long time. ● Add more CPU to DB server ● Improvements: ○ Overlord CPU utilization is less ○ Number of pending tasks are less ○ Task slot utilization graph looks stable
  • 41. Druid Database Middle managers Peon Processes More VMs Bigger Compact tasks Tiering Middle Managers
  • 42. Druid Database More slots per Middle Managers More VMs More slots Bigger Compact tasks
  • 43. Druid Database Right size Middle Managers Less VMs More slots Bigger Compact tasks
  • 44. Summary : Scaling Middle Manager ● Increased number of middle manager as so that more task slots are available for overlord to assign tasks. ● Then we increased number of slots per middle manager as new tasks were small i.e. having less number of files to ingest. ● We created a separate tier for compaction as these tasks took more resource then the current index tasks. ● Then we right sized the middle manager count in each tier by reducing it. 12 MMs * 5 slots => 24MMs * 5 slots 24 MMs * 5 slots => 12MMs * 10 slots 12 MMs * 10 slots => 10 MMs * 10 slots + 2 MMs *5 slots Tiering
  • 49. Summary of Coordinator crisis… ● Happy Overlord. ● But issues in Coordinator now: ○ Huge number of small segments. ○ Unavailable segments count increasing. ○ Coordinator CPU usage increasing ○ Coordinator cycle is taking too long to complete
  • 51. Druid Database Handling the Coordinator Coordinator Process
  • 52. Druid Database Handling the Coordinator Coordinator Process Bigger VM
  • 53. Druid Database Handling the Coordinator Coordinator Process Bigger VM Same big DB
  • 54. Handling the Coordinator… ● Increased Coordinator instance type as it is not scalable horizontally ● Tried the following coordinator dynamic configs:
  • 55. Handling the Coordinator… ● Increased Coordinator instance type as it is not scalable horizontally ● Tried few coordinator dynamic configs: maxSegmentsToMove: 1000 percentOfSegmentsToConsiderPerMove: 25 reducing the number of segments per coordinator cycle
  • 56. Handling the Coordinator… ● Increased Coordinator instance type as it is not scalable horizontally ● Tried few coordinator dynamic configs: maxSegmentsToMove: 1000 percentOfSegmentsToConsiderPerMove: 25 Assign segments In round-robin fashion first. Lazily reassign with chosen balancer strategy later useRoundRobinSegmentAssignment: true
  • 57. Handling the Coordinator… ● We saw this error in coordinator logs during auto compaction for many datasources. “is larger than inputSegmentSize[2147483648]” ● Removing this setting from auto compaction config resolved the issue. ● This is no longer an issue Druid 25 onwards. inputSegmentSizeBytes: 100TB
  • 58. Handling the Historicals ● Until auto compaction done: ○ More no of segments for queries ○ More processing power for historicals ● Cold data has HIGHER segment granularity ○ Compaction Done! ● Hot data has LOWER segment granularity ○ Compaction NOT done YET! Query for recent data Query for recent data Older Historicals Current Historicals Larger segments Smaller segments Datasource 2 Datasource 1 Datasource 1 Datasource 2
  • 60. Summary ● Once we stabilized Druid Ingestion and query both pipelines we onboarded all customers in a phased manner. ● Set the optimal queue size. ● To absorb the initial burst of tasks we increased MM count. ● Right size Overlord and coordinator once the onboarding was complete ● Do know overlord and coordinator settings well.
  • 61. Thank You Questions? Shivji Kumar Jha linkedin.com/in/shivjijha/ slideshare.net/shiv4289/presentations/ youtube.com/@shivjikumarjha Sachidananda Maharana https://www.linkedin.com/in/sachidanandamaharana/