9. Druid Ingestion
import BigDataApi as bda
tbl = bda.Table("hive/table_name")
# build spec
spec = bda.druid.DruidSpec.from_table(tbl)
.spec(ingestion_spec.json)
# create a job from the spec
job = bda.genie.DruidIndexerJob(spec)
.cluster(kg.druid.clusters.DRUID_CLUSTER_NAME)
# submit the job…
job.execute()
10. Druid Cluster @ Netflix
● r 4.16 x large instance type
● 0.12.2 version
● ~100s nodes
11. Multitenancy
● Single Tier
● Router
○ Ad hoc
○ Experimental - broker downtime acceptable. Used
for query fine tuning etc.
○ Reporting - pre-defined queries /dashboards
12. Autoscale
● Favor segments in memory
● Autoscale up - cluster disk utilization beyond 80%
● Handle large data ingestion without having to worry
about cluster tripping over
13. Deployment Pipeline
● Spinnaker (https://www.spinnaker.io/)
● Clusters upgraded using red black
○ Jenkins jobs - druid tar ball and debian package
○ Deploy components with new code line
○ Wait for segments to load
○ Switch dns records
○ Scale down old cluster
● Rollback
○ Switch dns back to old cluster
15. Use Cases
● Dashboard backend
● Sub second query times
○ User interactive slice and dice
○ Longer data retention vs Redshift
○ More dimensions vs Redshift
● Custom UI
22. Other use cases
● Payments analysis
● Algorithms comparison
● Security
● Quality of Experience (QoE)
23. Future work
● Real time ingestion
○ Tranquility or Kafka indexing
● Open source T-Digest based Histogram module
● Investigate tiering
● Change auto-scaling policy considering EBS
26. “With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
27. ● 160 Billion client side
data points daily
● 135+ million members
● 190 countries
● 300 million devices
● 4 major UI platforms
TVUI, Web, iOS,
Android
Measure Everything Consistently
37. Recap
● Ingesting consistent and
highly dimensional data
● Analyzing data via custom
web visualizations
● Summarizing responsibly via
sketch strings
● Druid helps us provide the
best customer experience
44. Druid 0.13.0
● Native parallel batch indexing (phase 1)
● Automatic compaction (phase 1)
● Ingestion statistics and errors via API
● SQL system tables: segments, tasks, servers
● SQL standard-compliant null handling option
● Additional aggregators (stringFirst/stringLast, new HllSketch)
● Support for multiple grouping specs in groupBy query
● Backpressure, compact result formats for large result sets
44
47. Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
47