Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Netflix Data Engineering @ Uber Engineering Meetup

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 36 Publicité

Netflix Data Engineering @ Uber Engineering Meetup

People, Platform, Projects: these slides overview how Netflix works with Big Data. I share how our teams are organized, the roles we typically have on the teams, an overview of our Big Data Platform, and two example projects.

People, Platform, Projects: these slides overview how Netflix works with Big Data. I share how our teams are organized, the roles we typically have on the teams, an overview of our Big Data Platform, and two example projects.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (19)

Publicité

Similaire à Netflix Data Engineering @ Uber Engineering Meetup (20)

Publicité

Netflix Data Engineering @ Uber Engineering Meetup

  1. 1. NETFLIX DATA ENGINEERING & ANALYTICS BLAKE IRVINE DATA ENGINEERING & ANALYTICS 2016.09.22
  2. 2. TOPICS PEOPLE WHO WE ARE PLATFORM HOW WE WORK PROJECTS WHAT WE DO
  3. 3. PEOPLE WHO WE ARE
  4. 4. PEOPLE Analytic Groups @ NFLX Data Engineering & Analytics Science & Algorithms FP&A
  5. 5. PEOPLE (Data) Engineering Groups @ NFLX Data Engineering & Analytics Cloud Platform Engineering Product Engineering
  6. 6. PEOPLE Talented Individuals
  7. 7. PEOPLE Aligned Teams = Maximum Context Marketing Product Playback Content FinOps
  8. 8. PEOPLE Roles + Freedom & Responsibility Data Analysts Viz Engineers Analytics Engineers Data Engineers
  9. 9. PEOPLE Role Core Functions Data Analysts Viz Engineers Analytics Engineers Data Engineers PipelinesDashboardsJS / VizAnalytics
  10. 10. QUESTION BREAK ?
  11. 11. PLATFORM HOW WE WORK
  12. 12. PLATFORM Another talented, aligned team
  13. 13. PEOPLE What is the Netflix data platform?
  14. 14. PLATFORM Simplified Overview Big Data Portal
  15. 15. PLATFORM Event Pipeline 〉 83 Million members 〉 1000+ Devices 〉 Up to 1 Trillion daily events 〉 ~1m traversal time 〉 S3 warehouse is 60 PB
  16. 16. PLATFORM Query Engines 〉 Hadoop (EMR) is primarily batch 〉 Presto is primarily ad-hoc 〉 Spark is emergent batch + ad-hoc
  17. 17. 〉 S3 is our source-of-truth 〉 We copy forward to Redshift 〉 Indexed data loaded to Druid 〉 Data access via Big Data API PLATFORM Data Stores
  18. 18. 〉 Big Data Portal 〉 Tableau 〉 JS / React / Node Apps Analytic Tools PLATFORM Big Data Portal
  19. 19. PLATFORM Query Engine Usage
  20. 20. PLATFORM Query Engine Usage
  21. 21. PLATFORM Analytic Tools Usage
  22. 22. PLATFORM My Team’s Analytic Tools Usage
  23. 23. QUESTION BREAK ?
  24. 24. PROJECTS WHAT WE DO
  25. 25. 〉 Big fast data 〉 Efficient compute PROJECTS WHAT WE DO
  26. 26. 〉 Partner Ecosystem Dashboard PROJECTS Big Fast Data
  27. 27. PROJECTS Partner Example - Trip to Manhattan
  28. 28. PROJECTS Partner Example - NYC Marriott
  29. 29. PROJECTS Partner Example - Work & Home Again
  30. 30. PROJECTS Partner Example - Multiple Partners
  31. 31. 〉 This dashboard’s dimensional grain >1B rows PROJECTS Partner Ecosystem Dashboard 〉 Current solution is 〉 ETL in Pig 〉 Prepare Druid indexed dataset with Hadoop job 〉 Load to 100 historical nodes 〉 Queries are typically <2 seconds 〉 Solution v1 was Tableau + Redshift SSD 〉 Some views rendered in <10 seconds 〉 But multiple views with filters took >1 minute 〉 Future solution is...
  32. 32. 〉 Counting things 〉 Measuring time PROJECTS Efficient compute
  33. 33. 〉 Easy: playback events, units of time, other discrete events 〉 Hard: uniques, especially over windows or groups PROJECTS Efficient compute - Counting 〉 Solution: estimates, i.e. HyperLogLog(++) 〉 Where we’ll implement: 〉 Distribution arrays in staging tables 〉 In Druid data sources 〉 Merge functions in query layer, so UDFs for Pig/Hive/Presto/JS
  34. 34. 〉 Easy: average, but not accurate 〉 Hard: percentiles PROJECTS Efficient compute - Measuring time 〉 Solution: estimates, most likely T-digest 〉 Where we’ll implement: 〉 Digests in staging tables 〉 In Druid data sources 〉 Merge functions in query layer, so UDFs for Pig/Hive/Presto/JS
  35. 35. THANK YOU PEOPLE WHO WE ARE PLATFORM HOW WE WORK PROJECTS WHAT WE DO
  36. 36. FINAL QUESTION BREAK ?!

×