Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data Analytics with Druid

2 462 vues

Publié le

Data Analytics with Druid using Jupyter

Publié dans : Données & analyses
  • Soyez le premier à commenter

Data Analytics with Druid

  1. 1. DATA ANALYTICS WITH DRUID YOU SUN JEONG
  2. 2. DATA ANALYTICS WITH DRUID WHO AM I ? Senior Software Engineer of SK Telecom Commercial Products Big Data Discovery Solution (~’16) Hadoop DW (~’15) PaaS(CloudFoundry) (~’13) Iaas (OpenStack) (~’13) Mail to : jerryjung@apache.org 2
  3. 3. DATA ANALYTICS WITH DRUID FOOTPRINTS 2014 2015 
 - Hadoop DW 
 - Realtime NW Analytics 2016 
 - Big Data Discovery
 - Streaming Processing 3
  4. 4. DATA ANALYTICS WITH DRUID AGENDA ‣ History ‣ What is Druid? ‣ Druid Architecture ‣ Real-Time Ingestion Demo (15m) ‣ Cohort Analysis (15m) 4
  5. 5. DATA ANALYTICS WITH DRUID HISTORY ▸ Development started at Meta markets in 2011 ▸ Apache V2 in early 2015 ▸ 150+ contributors today ▸ https://github.com/druid-io 5
  6. 6. DATA ANALYTICS WITH DRUID DATA LAKE 6 https://www.linkedin.com/pulse/more-analytics-than-just-fishing-data-lake-john-poppelaars
  7. 7. DATA ANALYTICS WITH DRUID DW VS DATA LAKE http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html 7
  8. 8. DATA ANALYTICS WITH DRUID WHAT IS DRUID Distributed, 
 In-memory Multi-dimensional OLAP store 8
  9. 9. DATA ANALYTICS WITH DRUID PROBLEMS timestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832057930 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 3909846810 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 0381837193 Female 0 timestamp impressions clicks 2011-01-01T00:00:00Z 10 6 timestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832057930 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 9530174728 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 0381837193 Female 0 timestamp domain gender impressions clicks 2011-01-01T00:00:00Z bieber.com Female 4 2 2011-01-01T00:00:00Z ultra.com Female 3 1 2011-01-01T00:00:00Z ultra.com Male 3 2 9
  10. 10. DATA ANALYTICS WITH DRUID BIG DATA DISCOVERY ▸ Roll-up ▸ Summarizing over a dimension ▸ Drill-down ▸ Focusing (zooming in) ▸ Slicing and dicing ▸ Reducing dimensions (slice) ▸ Picking values of specific dimensions (dice) ▸ Pivoting ▸ Rotating multi-dimensional cube 10
  11. 11. DATA ANALYTICS WITH DRUID OLAP CUBE ▸ Slice and Dice 11
  12. 12. DATA ANALYTICS WITH DRUID IN-MEMORY 12
  13. 13. DATA ANALYTICS WITH DRUID COLUMNAR STORAGE 13
  14. 14. DATA ANALYTICS WITH DRUID DRUID TERMS ▸ Data ▸ Timestamp ▸ Dimension ▸ Metric ▸ Datasource ▸ Segment ▸ Granularity 14
  15. 15. DATA ANALYTICS WITH DRUID DRUID ARCHITECTURE REALTIME BROKER HISTORICAL 15
  16. 16. DATA ANALYTICS WITH DRUID ARCHITECTURE - BATCH INGESTION HDFS HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries 16
  17. 17. DATA ANALYTICS WITH DRUID ARCHITECTURE - STREAMING INGESTION REALTIME NODE HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries Streaming 17
  18. 18. DATA ANALYTICS WITH DRUID ARCHITECTURE - LAMBDA REALTIME NODE HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries Streaming HDFS 18
  19. 19. DATA ANALYTICS WITH DRUID GLUE ARCHITECTURE REAL TIME TASK HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries Streaming STREAM PROCESSOR
 (TRANQUILITY) Kafka Indexing Service 19
  20. 20. DATA ANALYTICS WITH DRUID REAL WORLD ARCHITECTURE DATA 
 NODE #1 DATA 
 NODE #N OVERLORD MIDDLE MANAGE
 #1 COORDI
 NATOR MYSQL HA 
 PROXY MEMCACHED
 #2 BROKER NODE
 #1 BROKER NODE
 #1 MEMCACHED
 #3 MEMCACHED
 #1 HISTORICAL NODE #1 HISTORICAL NODE #N MIDDLE MANAGE
 #N ZK1 ZK2 ZK3 20
  21. 21. DATA ANALYTICS WITH DRUID DRUID MONITORING 21 http://www.slideshare.net/CharlesAllen9/programmatic-bidding-data-streams-druid
  22. 22. DATA ANALYTICS WITH DRUID DRUID DATASOURCE 22
  23. 23. RDRUID DATA ANALYTICS WITH DRUID https://github.com/druid-io/RDruid 23
  24. 24. DATA ANALYTICS WITH DRUID PYDROID 24 https://github.com/druid-io/pydruid
  25. 25. DATA ANALYTICS WITH DRUID DEMO ▸ Jupyter Notebook(PyDruid) ▸ Mobile App User Events for 1 week 
 : 2 billion events ▸ Scenario 
 : Unique users
 Cohort Analysis 25
  26. 26. DEMO
  27. 27. DATA ANALYTICS WITH DRUID MAY THE FORCE BE WITH YOU 27
  28. 28. DATA ANALYTICS WITH DRUID REFERENCES ▸ Druid
 : http://www.popit.kr/tag/druid/ 
 (https://www.facebook.com/popitkr/)
 : http://druid.io/ ▸ Cohort Analysis
 : http://www.gregreda.com/2015/08/23/cohort-analysis- with-python/ ▸ Druid Meetup@Seoul
 : http://www.meetup.com/Druid-Seoul/ 28
  29. 29. DATA ANALYTICS WITH DRUID POPIT 29 https://www.facebook.com/popitkr/
  30. 30. Q&A THANK YOU DATA ANALYTICS WITH DRUID 30

×