SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
peter.marshall@imply.io
20 years in Enterprise Architecture
CRM, EDRM, ERP, EIP, Digital Services,
Security, BI, RI, and MDM
BA Theology (!) and Computer Studies
TOGAF certified
Book collector & A/V buyer
Prime Timeline = proper timeline
#werk
Peter Marshall
Technology Evangelist
petermarshall.io
Community Issues
Tips & Tricks
Call to Action
Questions & Answers
Peter Marshall
Technology Evangelist
@petermarshallio
peter.marshall@imply.io
Jelena Zanko
Senior Community Manager
@JelenaZanko
jelena.zanko@imply.io
Matt Sarrel
Developer Evangelist
@msarrel
matt.sarrel@imply.io
Rachel Pedreschi
Vice President of Community
@rachelpedreschi
rachel@imply.io
ASSIST WRITE EXCHANGE
ASSIST
The day-to-day problems people have
The most common issues
Potential content
Potential docs updates
Potential code changes
Archdruids!
DESIGN
Defining the data pipeline,
noting all the building
blocks required, noting how
only how they will be
realised but how and what
data objects will flow
through that pipeline and
with what size, shape, and
regularity.
DEPLOY
Manually or with
automation, assigning
Apache Druid components
and configurations to the
infrastructure - including
network services, routing
and firewall configuration,
encryption certificates -
along with the three Druid
dependencies: deep
storage, Zookeeper, and a
metadata database.
CREATE
Using the features of
Apache Druid within the
pipeline to achieve the
desired design.
STABILISE
Hardening and all those
tasks you would associate
with good service
transition, from defining
OLAs / SLAs to training and
educating your target
audience.
OPERATE
Monitoring, support and
maintenance of the
transitioned system to
meet SLAs.
Ingest
DBM
Query
Ingest
DBM
Query
Defining ingestion tasks that will bring
statistics-ready data into Druid from
storage and delivery services,
(including schema, parsing rules,
transformation, filtering, connectivity,
and tuning options) and ensuring
their distributed execution, led by the
overlord, is performant and
complete.
Led by the coordinator, replication
and distribution of the ingested data
according to rules, while allowing for
defragmenting (“compact”), reshaping,
heating / cooling, and deleting that
data.
Programming SQL / Druid Native code
executed by the distributed processes
that are led by the broker service
(possibly via the router process) with
security applied.
DESIGN DEPLOY CREATE OPERATE
Ingest
DBM
Query
Defining ingestion tasks that will bring
statistics-ready data into Druid from
storage and delivery services,
(including schema, parsing rules,
transformation, filtering, connectivity,
and tuning options) and ensuring
their distributed execution, led by the
overlord, is performant and complete.
Led by the coordinator, replication
and distribution of the ingested data
according to rules, while allowing for
defragmenting (“compact”), reshaping,
heating / cooling, and deleting that
data.
Programming SQL / Druid Native code
executed by the distributed processes
that are led by the broker service
(possibly via the router process) with
security applied.
General Questions
Specifications (ingestion and
compaction), and how they are
written or generated
Execution of the ingestion
Inbound Integration to things like
Hadoop and Kafka
General Questions
Deletion (kill tasks) and
distribution of ingested data,
whether that’s immediately
afterwards or afterwards
Any metadata questions, ie sys.*
Auto Compaction configuration
(not the job itself - that’s a spec…)
General Questions
Authorisation and
Authentication via the broker
Designing fast, effective queries,
whether that’s SQL or Native.
Execution of queries
Outbound Integration of Druid
with tools like Superset
“ingestion is not happening to druid even if the data is present in the topic.”
“compact task seems to be blocked by the index task”
“failing task with "runnerStatusCode":"WAITING"”
“Ingestion task fails with RunTime Exception during BUILD_SEGMENTS phase”
“the task is still running until the time limit specified and then is marked as FAILED”
“it seems that the throughput does not cross 1M average”
“its taking more than hour to ingest. When we triggered kill task, its taking forever”
“tips or tricks on improving ingestion performance?”
“Ingestion was throttled for [35,610] millis because persists were pending”
Examples of Ingestion Execution problems
Examples of Ingestion Specification problems
“How to resolve for NULL values when they are coming from source table?”
“Previous sequenceNumber [289] is no longer available for partition [0].”
“Error on batch ingesting from different druid datasource”
“how to do some data formatting while handling schema changes”
“I am not seeing Druid doing any rollups”
“regexp_extract function is causing nullpointerexceptions”
“Anyone tried to hardcode the timeStamp?”
Don’t Walk Alone
Work with your end users to sketch
out what your Druid-powered user
interface is going to look like.
DESIGN DEPLOY CREATE STABILISE OPERATE
Tips for Druid Design
Real-time data analysis starts with time as a key dimension.
Comparisons make people think differently.
Filters make one visual cover multiple contexts.
Measures make one visual cover multiple indicators.
Create data sources focused on speed.
Create magic!
DESIGN DEPLOY CREATE STABILISE OPERATE
DESIGN DEPLOY CREATE STABILISE OPERATE
Druid != Island
Think about how and what you will
deploy onto your infrastructure,
especially Druid’s dependencies
Production
Specialised collectors
Applications & APIs
Machine & Human Data
Environmental Sensors
Stream & Bulk Repositories
Delivery
Real-time Analytics
BI Reporting & Dashboards
Buses, Queues & Databases
Search & Filtering UIs
Applications & APIs
Consolidation
Enrichment
Transformation & Stripping
Verification & Validation
Filtering & Sorting
Processing
ENRICHMENT AND CLEANSING
REDUCING IN THE SIZE OF THE INPUT
Storage
RETENTION FOR LATER
FOR OPERATIONS & FOR RESEARCH
Query
Feature & Structure Discovery
Segmentation & Classification
Recommendation
Prediction & Anomaly Detection
Statistical Calculations
EASY AND SPEEDY ANALYSIS
BY STATS, LEARNING, SEARCH...
Delivery
GUARANTEED DELIVERY
POSSIBLY STORING FOR A LIMITED PERIOD
DESIGN DEPLOY CREATE STABILISE OPERATE
Jump Right Er… no.
Configure JRE properties well -
especially heaps - and choose the right
hardware
Take time to read the tuning guide
druid.apache.org/docs/latest/operations/basic-cluster-tuning.html
druid.apache.org/docs/latest/configuration/index.html#jvm-configuration-best-practices
Historical Runtime Properties
druid.server.http.numThreads
druid.processing.buffer.sizeBytes
druid.processing.numMergeBuffers
druid.processing.numThreads
druid.server.maxSize
druid.historical.cache.useCache
druid.historical.cache.populateCache
druid.cache.sizeInBytes
Historical Java Properties
-Xms
-Xmx
-XX:MaxDirectMemorySize
Middle Manager Runtime Properties
druid.worker.capacity
druid.server.http.numThreads
druid.indexer.fork.property.druid.processing.numMergeBuffers
druid.indexer.fork.property.druid.processing.buffer.sizeBytes
druid.indexer.fork.property.druid.processing.numThreads
druid.processing.buffer.sizeBytes
druid.processing.numMergeBuffers
druid.processing.numThreads
MiddleManager Java Properties
-Xms
-Xmx
MiddleManager Peon Java Properties
-Xms
-Xmx
-XX:MaxDirectMemorySize
Broker Runtime Properties
druid.server.http.numThreads
druid.broker.http.numConnections
druid.broker.cache.useCache
druid.broker.cache.populateCache
druid.processing.buffer.sizeBytes
druid.processing.numThreads
druid.processing.numMergeBuffers
druid.broker.http.maxQueuedBytes
Broker / Coordinator / Overlord Java Properties
-Xms
-Xmx
-XX:MaxDirectMemorySize
DESIGN DEPLOY CREATE STABILISE OPERATE
Stay Connected
Druid is a highly distributed, loosely
coupled system on purpose.
Care for your interprocess
communication systems and paths:
especially Zookeeper and Http
druid.apache.org/docs/latest/dependencies/zookeeper.html
druid.apache.org/docs/latest/configuration/index.html#zookeeper
Know Your Team
Get to know the core distributed
collaborations of Apache Druid
DESIGN DEPLOY CREATE STABILISE OPERATE
Zookeeper
Historical
Historical
OverlordOverlordCoordinator
Metadata
Store
Historical
BrokerOverlord
Middle Manager
(Indexer)
Deep Store
DESIGN DEPLOY CREATE STABILISE OPERATE
Love Your Log
Get to know the logs.
For ingestion, particularly the overlord,
middle manager and its tasks.
For what happens next, particularly
the coordinator and historicals.
DESIGN DEPLOY CREATE STABILISE OPERATE
K.I.S.S.
Be agile: set up a lab, start simple and
start small, working up to perfection
Create a target query list
Understand which source data columns you will need
at ingestion time (filtering, transformation, lookup) and
which are used at query time (filtering, sorting,
calculations, grouping, bucketing, aggregations)
Set up your dimension spec and execute queries,
recording query performance
Explore what other queries (Time Series, Group By,
Top N) you could do with the data
Add more subtasks and monitor the payloads
Add more data and check the lag
Some Ingestion Spec Tips
Use ingestion-time filter to eke out performance and
storage efficiencies
Use transforms to replace or create data closer to the
queries that people will execute
Use time granularity and roll-up to generate metrics
and datasketches (set, quantile, and cardinality)
DESIGN DEPLOY CREATE STABILISE OPERATE
Digest the Specifics
Learn ingestion specifications in detail
through your own exploration and
experimentation, from the docs, and
from the community
DESIGN DEPLOY CREATE STABILISE OPERATE
Understand Segments
Historicals serve all used segments,
and deep storage stores them
Query time relates directly to segment
size: lots of small segments means lots
of small query tasks
Segments are tracked in master nodes
and registered in the metadata DB
Segment Tips & Tricks
Filter rows and think carefully about what dimensions you need
Use different segment granularities and row maximums to control the number of
segments generated
Apply time bucketting with query granularity and roll-up
Think about tiering your historicals using drop and load rules
Consider not just initial ingestion but on-going re-indexing
Never forget compaction!
Check local (maxBytesInMemory, maxRowsInMemory,
intermediatePersistPeriod) and deep storage (maxRowsPerSegment,
maxTotalRows, intermediateHandoffPeriod, taskDuration) persists
DESIGN DEPLOY CREATE STABILISE OPERATE
Ask Us Anything!
Find other people in the community
that have had the same issue as you
DESIGN DEPLOY CREATE STABILISE OPERATE
Learn from the best!
Find other people in the community
who have walked your walk!
COMMUNITY@IMPLY.IO
druid.apache.org/community
DESIGN DEPLOY CREATE STABILISE OPERATE
Get Meta
Collect metrics and understand how
your work affects them
Infrastructure
Host
Druid Service
Instance Type
Imply Version
Druid Version
Query Data
Num Metrics
Num Complex Metrics
Interval
Duration
Filters?
Remote Address
Memory Used
Maximum Memory Used
Garbage Collections
Total / Average GC Time
Total CPU Time
User CPU Time
CPU Wait Time
CPU System Use
Avg Jetty Connections
Min / Max Jetty Conns
Infrastructure Query Cache
Hit Rate
Hits
Misses
Timeouts
Errors
Size
Number of Entries
Average Entry Size
Evictions
Query Patterns
Query Count
Average Query Time
98%ile Query Time
Max Query Time
Tasks
Task Id
Task Status
Ingestion
Events Processed
Unparseable Events
Thrown Away Events
Output Rows
Persists
Total Back Pressure
Message Gap
Kafka Lag
Avg Return Size
Total CPU Time
Subquery Count
Avg Subquery Time
Data Source
Query Type
Native / Query ID
Successful?
Identity
Context
https://druid.apache.org/docs/latest/operations/metrics.html
Metrics & Measures
Inform capacity planning
Isolate potential execution bottlenecks
Check and investigate cluster performance
Flatten the learning curve for running Druid at scale
Infrastructure Use & Experience Ingestion
How can we all help each other?
1 Come with us, join us...
Join ASF Slack and the Google User Group, say hi, give people (socially distanced) hugs -
and link back to the docs
2 Tippy-Tappy-Typey-Type-Type
Blog helpful tips and tricks and walkthroughs of your own ingestion integrations,
specifications, and execution configurations. Contribute code and doc updates :-D
3 Make Pretty Slides
Take part in Ask Me Anything, Town Hall, and Druid meetups about ingestion.
Walkthroughs Tips & Tricks DDCSO Content
Community Issues
Tips & Tricks
Call to Action
Questions & Answers

Contenu connexe

Tendances

Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Edureka!
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium confluent
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptxDori Waldman
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changesconfluent
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaEdureka!
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 

Tendances (20)

Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Apache spark
Apache sparkApache spark
Apache spark
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 

Similaire à Druid Adoption Tips and Tricks

Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Imply
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetupclive boulton
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the CloudKellyn Pot'Vin-Gorman
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxSonuShaw16
 
Mvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops worldMvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops worldAlessandro Alpi
 
All Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudAll Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudInside Analysis
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 

Similaire à Druid Adoption Tips and Tricks (20)

Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
RavenDB overview
RavenDB overviewRavenDB overview
RavenDB overview
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetup
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
12363 database certification
12363 database certification12363 database certification
12363 database certification
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptx
 
Mvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops worldMvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops world
 
All Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudAll Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the Cloud
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 

Plus de Imply

Pivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataPivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataImply
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot InstancesImply
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed ProcessesImply
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorImply
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataImply
 
Nielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in PracticeNielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in PracticeImply
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidImply
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache DruidImply
 
Maximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basicsMaximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basicsImply
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Imply
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid Imply
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache DruidImply
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18Imply
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and RoadmapImply
 

Plus de Imply (20)

Pivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataPivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming data
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
 
Nielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in PracticeNielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in Practice
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache Druid
 
Maximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basicsMaximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basics
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache Druid
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Druid Adoption Tips and Tricks

  • 1. Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
  • 2. peter.marshall@imply.io 20 years in Enterprise Architecture CRM, EDRM, ERP, EIP, Digital Services, Security, BI, RI, and MDM BA Theology (!) and Computer Studies TOGAF certified Book collector & A/V buyer Prime Timeline = proper timeline #werk Peter Marshall Technology Evangelist petermarshall.io
  • 3. Community Issues Tips & Tricks Call to Action Questions & Answers
  • 4.
  • 5. Peter Marshall Technology Evangelist @petermarshallio peter.marshall@imply.io Jelena Zanko Senior Community Manager @JelenaZanko jelena.zanko@imply.io Matt Sarrel Developer Evangelist @msarrel matt.sarrel@imply.io Rachel Pedreschi Vice President of Community @rachelpedreschi rachel@imply.io ASSIST WRITE EXCHANGE
  • 6. ASSIST The day-to-day problems people have The most common issues Potential content Potential docs updates Potential code changes Archdruids!
  • 7. DESIGN Defining the data pipeline, noting all the building blocks required, noting how only how they will be realised but how and what data objects will flow through that pipeline and with what size, shape, and regularity. DEPLOY Manually or with automation, assigning Apache Druid components and configurations to the infrastructure - including network services, routing and firewall configuration, encryption certificates - along with the three Druid dependencies: deep storage, Zookeeper, and a metadata database. CREATE Using the features of Apache Druid within the pipeline to achieve the desired design. STABILISE Hardening and all those tasks you would associate with good service transition, from defining OLAs / SLAs to training and educating your target audience. OPERATE Monitoring, support and maintenance of the transitioned system to meet SLAs. Ingest DBM Query
  • 8. Ingest DBM Query Defining ingestion tasks that will bring statistics-ready data into Druid from storage and delivery services, (including schema, parsing rules, transformation, filtering, connectivity, and tuning options) and ensuring their distributed execution, led by the overlord, is performant and complete. Led by the coordinator, replication and distribution of the ingested data according to rules, while allowing for defragmenting (“compact”), reshaping, heating / cooling, and deleting that data. Programming SQL / Druid Native code executed by the distributed processes that are led by the broker service (possibly via the router process) with security applied.
  • 10. Ingest DBM Query Defining ingestion tasks that will bring statistics-ready data into Druid from storage and delivery services, (including schema, parsing rules, transformation, filtering, connectivity, and tuning options) and ensuring their distributed execution, led by the overlord, is performant and complete. Led by the coordinator, replication and distribution of the ingested data according to rules, while allowing for defragmenting (“compact”), reshaping, heating / cooling, and deleting that data. Programming SQL / Druid Native code executed by the distributed processes that are led by the broker service (possibly via the router process) with security applied. General Questions Specifications (ingestion and compaction), and how they are written or generated Execution of the ingestion Inbound Integration to things like Hadoop and Kafka General Questions Deletion (kill tasks) and distribution of ingested data, whether that’s immediately afterwards or afterwards Any metadata questions, ie sys.* Auto Compaction configuration (not the job itself - that’s a spec…) General Questions Authorisation and Authentication via the broker Designing fast, effective queries, whether that’s SQL or Native. Execution of queries Outbound Integration of Druid with tools like Superset
  • 11.
  • 12.
  • 13. “ingestion is not happening to druid even if the data is present in the topic.” “compact task seems to be blocked by the index task” “failing task with "runnerStatusCode":"WAITING"” “Ingestion task fails with RunTime Exception during BUILD_SEGMENTS phase” “the task is still running until the time limit specified and then is marked as FAILED” “it seems that the throughput does not cross 1M average” “its taking more than hour to ingest. When we triggered kill task, its taking forever” “tips or tricks on improving ingestion performance?” “Ingestion was throttled for [35,610] millis because persists were pending” Examples of Ingestion Execution problems
  • 14. Examples of Ingestion Specification problems “How to resolve for NULL values when they are coming from source table?” “Previous sequenceNumber [289] is no longer available for partition [0].” “Error on batch ingesting from different druid datasource” “how to do some data formatting while handling schema changes” “I am not seeing Druid doing any rollups” “regexp_extract function is causing nullpointerexceptions” “Anyone tried to hardcode the timeStamp?”
  • 15. Don’t Walk Alone Work with your end users to sketch out what your Druid-powered user interface is going to look like. DESIGN DEPLOY CREATE STABILISE OPERATE
  • 16. Tips for Druid Design Real-time data analysis starts with time as a key dimension. Comparisons make people think differently. Filters make one visual cover multiple contexts. Measures make one visual cover multiple indicators. Create data sources focused on speed. Create magic!
  • 17.
  • 18.
  • 19. DESIGN DEPLOY CREATE STABILISE OPERATE
  • 20. DESIGN DEPLOY CREATE STABILISE OPERATE Druid != Island Think about how and what you will deploy onto your infrastructure, especially Druid’s dependencies
  • 21. Production Specialised collectors Applications & APIs Machine & Human Data Environmental Sensors Stream & Bulk Repositories Delivery Real-time Analytics BI Reporting & Dashboards Buses, Queues & Databases Search & Filtering UIs Applications & APIs Consolidation Enrichment Transformation & Stripping Verification & Validation Filtering & Sorting Processing ENRICHMENT AND CLEANSING REDUCING IN THE SIZE OF THE INPUT Storage RETENTION FOR LATER FOR OPERATIONS & FOR RESEARCH Query Feature & Structure Discovery Segmentation & Classification Recommendation Prediction & Anomaly Detection Statistical Calculations EASY AND SPEEDY ANALYSIS BY STATS, LEARNING, SEARCH... Delivery GUARANTEED DELIVERY POSSIBLY STORING FOR A LIMITED PERIOD
  • 22. DESIGN DEPLOY CREATE STABILISE OPERATE Jump Right Er… no. Configure JRE properties well - especially heaps - and choose the right hardware Take time to read the tuning guide druid.apache.org/docs/latest/operations/basic-cluster-tuning.html druid.apache.org/docs/latest/configuration/index.html#jvm-configuration-best-practices
  • 23. Historical Runtime Properties druid.server.http.numThreads druid.processing.buffer.sizeBytes druid.processing.numMergeBuffers druid.processing.numThreads druid.server.maxSize druid.historical.cache.useCache druid.historical.cache.populateCache druid.cache.sizeInBytes Historical Java Properties -Xms -Xmx -XX:MaxDirectMemorySize Middle Manager Runtime Properties druid.worker.capacity druid.server.http.numThreads druid.indexer.fork.property.druid.processing.numMergeBuffers druid.indexer.fork.property.druid.processing.buffer.sizeBytes druid.indexer.fork.property.druid.processing.numThreads druid.processing.buffer.sizeBytes druid.processing.numMergeBuffers druid.processing.numThreads MiddleManager Java Properties -Xms -Xmx MiddleManager Peon Java Properties -Xms -Xmx -XX:MaxDirectMemorySize Broker Runtime Properties druid.server.http.numThreads druid.broker.http.numConnections druid.broker.cache.useCache druid.broker.cache.populateCache druid.processing.buffer.sizeBytes druid.processing.numThreads druid.processing.numMergeBuffers druid.broker.http.maxQueuedBytes Broker / Coordinator / Overlord Java Properties -Xms -Xmx -XX:MaxDirectMemorySize
  • 24. DESIGN DEPLOY CREATE STABILISE OPERATE Stay Connected Druid is a highly distributed, loosely coupled system on purpose. Care for your interprocess communication systems and paths: especially Zookeeper and Http druid.apache.org/docs/latest/dependencies/zookeeper.html druid.apache.org/docs/latest/configuration/index.html#zookeeper
  • 25. Know Your Team Get to know the core distributed collaborations of Apache Druid DESIGN DEPLOY CREATE STABILISE OPERATE
  • 27. DESIGN DEPLOY CREATE STABILISE OPERATE Love Your Log Get to know the logs. For ingestion, particularly the overlord, middle manager and its tasks. For what happens next, particularly the coordinator and historicals.
  • 28. DESIGN DEPLOY CREATE STABILISE OPERATE K.I.S.S. Be agile: set up a lab, start simple and start small, working up to perfection
  • 29. Create a target query list Understand which source data columns you will need at ingestion time (filtering, transformation, lookup) and which are used at query time (filtering, sorting, calculations, grouping, bucketing, aggregations) Set up your dimension spec and execute queries, recording query performance Explore what other queries (Time Series, Group By, Top N) you could do with the data Add more subtasks and monitor the payloads Add more data and check the lag Some Ingestion Spec Tips Use ingestion-time filter to eke out performance and storage efficiencies Use transforms to replace or create data closer to the queries that people will execute Use time granularity and roll-up to generate metrics and datasketches (set, quantile, and cardinality)
  • 30. DESIGN DEPLOY CREATE STABILISE OPERATE Digest the Specifics Learn ingestion specifications in detail through your own exploration and experimentation, from the docs, and from the community
  • 31. DESIGN DEPLOY CREATE STABILISE OPERATE Understand Segments Historicals serve all used segments, and deep storage stores them Query time relates directly to segment size: lots of small segments means lots of small query tasks Segments are tracked in master nodes and registered in the metadata DB
  • 32. Segment Tips & Tricks Filter rows and think carefully about what dimensions you need Use different segment granularities and row maximums to control the number of segments generated Apply time bucketting with query granularity and roll-up Think about tiering your historicals using drop and load rules Consider not just initial ingestion but on-going re-indexing Never forget compaction! Check local (maxBytesInMemory, maxRowsInMemory, intermediatePersistPeriod) and deep storage (maxRowsPerSegment, maxTotalRows, intermediateHandoffPeriod, taskDuration) persists
  • 33. DESIGN DEPLOY CREATE STABILISE OPERATE Ask Us Anything! Find other people in the community that have had the same issue as you
  • 34. DESIGN DEPLOY CREATE STABILISE OPERATE Learn from the best! Find other people in the community who have walked your walk!
  • 36. DESIGN DEPLOY CREATE STABILISE OPERATE Get Meta Collect metrics and understand how your work affects them
  • 37. Infrastructure Host Druid Service Instance Type Imply Version Druid Version Query Data Num Metrics Num Complex Metrics Interval Duration Filters? Remote Address Memory Used Maximum Memory Used Garbage Collections Total / Average GC Time Total CPU Time User CPU Time CPU Wait Time CPU System Use Avg Jetty Connections Min / Max Jetty Conns Infrastructure Query Cache Hit Rate Hits Misses Timeouts Errors Size Number of Entries Average Entry Size Evictions Query Patterns Query Count Average Query Time 98%ile Query Time Max Query Time Tasks Task Id Task Status Ingestion Events Processed Unparseable Events Thrown Away Events Output Rows Persists Total Back Pressure Message Gap Kafka Lag Avg Return Size Total CPU Time Subquery Count Avg Subquery Time Data Source Query Type Native / Query ID Successful? Identity Context https://druid.apache.org/docs/latest/operations/metrics.html Metrics & Measures
  • 38. Inform capacity planning Isolate potential execution bottlenecks Check and investigate cluster performance Flatten the learning curve for running Druid at scale Infrastructure Use & Experience Ingestion
  • 39. How can we all help each other? 1 Come with us, join us... Join ASF Slack and the Google User Group, say hi, give people (socially distanced) hugs - and link back to the docs 2 Tippy-Tappy-Typey-Type-Type Blog helpful tips and tricks and walkthroughs of your own ingestion integrations, specifications, and execution configurations. Contribute code and doc updates :-D 3 Make Pretty Slides Take part in Ask Me Anything, Town Hall, and Druid meetups about ingestion.
  • 40. Walkthroughs Tips & Tricks DDCSO Content
  • 41. Community Issues Tips & Tricks Call to Action Questions & Answers