SlideShare une entreprise Scribd logo
1  sur  50
Data-Driven @ Netflix
Michelle Ufford
Principal Architect
Data Engineering & Analytics
Michelle Ufford
Highlights
● Principal Architect at Netflix
Data Engineering & Analytics
● Prev. Engineering Manager at GoDaddy
Data Platform
● Microsoft Data Platform MVP
● 10+ years building web-scale analytics &
data engineering infrastructure
● advises on Big Data topics
Microsoft, Hortonworks, Teradata, etc.
Gratuitous picture of my kids
By the Numbers.
The business numbers.
86.7 million
members
1000+ devices
supported
125+ million
hours watched
launched
19 years ago
every. day.
Any device. Anywhere.*
* Well, almost anywhere.
The data numbers.
4 petabyte
DW reads
300 terabyte
DW writes
40 petabyte
data warehouse
700+ billion
events written
Data in Action.
Content.
What should we license?
Predicting Value for
Licensed Content.
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
● past performance of similar content on Netflix
● broadcast & Box Office performance
● talent (writers, actors, directors, etc.)
● critic & user reviews
● awards & accolades
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
● terms (length, exclusivity, etc.)
● bid amount
● negotiations
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
● value / cost
● if efficient, license
Feature Engineering Predictive Models License Terms Content Efficiency
“ last year our original content overall
was some of our most efficient content.
”
“ We are building a studio in the cloud
and pioneering new approaches to movie
production, optimizing pitches, production
schedules, subtitling, and digital asset
management for our Original content. ”
What should we license?create?
Product UX.
Data. Driven. Experience.
There are 86 million different
versions of Netflix.
billboard
rows, row order
titles, title order
title artwork
Public Relations.
Analytics of news.Analytics is news.
Content Delivery.
Monitoring a global service.
YouTube video of Vizceral demo
https://youtu.be/JctsPpgEsVs
Behind the Scenes.
data access
AWS
S3
Big Data Platform
Amazon
Redshift
data processing
fast storage data viz
METACA
T
data services
events data
operational data
elastic storage Apache Pig
Philosophy.
Freedom &
Responsibility.
Context,
Not Control.
Highly Aligned &
Loosely Coupled.
Big Data
Platform
Data Engineering & Analytics
MarketingProduct PlaybackContent Finance
105 talented engineers & analysts
data viz engineers
analytics engineers
data engineers
Big Data
Platform
analysts
Results,
Not Opinions.
Experimentation Platform
Batch &
Ad Hoc
Analysis
Questions?
Thank you
for attending!
Michelle Ufford
linkedin.com/in/mufford
@sqlfool
Data @
Netflix
@NetflixData
hadoopsie.com techblog.netflix.com
tinyurl.com/NetflixData

Contenu connexe

Tendances

Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
DataStax
 

Tendances (20)

Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at NetflixTableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
 
Netflix
NetflixNetflix
Netflix
 
Group case study assignment
Group case study assignmentGroup case study assignment
Group case study assignment
 
Spark at Zillow
Spark at ZillowSpark at Zillow
Spark at Zillow
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
 
Netflix Case Study
Netflix Case StudyNetflix Case Study
Netflix Case Study
 
How to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4jHow to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4j
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Netflix-Case Study-- When a Pioneer Has to Reinvent Itself
Netflix-Case Study-- When a Pioneer Has to Reinvent Itself Netflix-Case Study-- When a Pioneer Has to Reinvent Itself
Netflix-Case Study-- When a Pioneer Has to Reinvent Itself
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Company Presentation - Netflix
Company Presentation - NetflixCompany Presentation - Netflix
Company Presentation - Netflix
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Netflix Case Study
Netflix Case StudyNetflix Case Study
Netflix Case Study
 
Airbyte - Series-B deck
Airbyte - Series-B deckAirbyte - Series-B deck
Airbyte - Series-B deck
 
Personalization at Netflix - Making Stories Travel
Personalization at Netflix -  Making Stories Travel Personalization at Netflix -  Making Stories Travel
Personalization at Netflix - Making Stories Travel
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 

Similaire à Data-Driven @ Netflix

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Databricks
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Kai Wähner
 

Similaire à Data-Driven @ Netflix (20)

Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
Getting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBGetting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDB
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
[Public] 7 archetipi della tecnologia moderna [italy]
[Public] 7 archetipi della tecnologia moderna [italy][Public] 7 archetipi della tecnologia moderna [italy]
[Public] 7 archetipi della tecnologia moderna [italy]
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
 
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
 
Big Data LDN 2017: Big Impact with Big Data
Big Data LDN 2017: Big Impact with Big DataBig Data LDN 2017: Big Impact with Big Data
Big Data LDN 2017: Big Impact with Big Data
 
Applying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time ApplicationsApplying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time Applications
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 
JASPERSOFT LIVE DEMO - NAM
JASPERSOFT LIVE DEMO - NAMJASPERSOFT LIVE DEMO - NAM
JASPERSOFT LIVE DEMO - NAM
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
 
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Data-Driven @ Netflix

Notes de l'éditeur

  1. Abstract: Netflix is the quintessential data-driven company. It’s 83 million members stream more than 125 million hours in over 190 countries every day and generate more than 700 billion events in the process. In this session, we’ll share how data is used to make informed decisions across the entire business — from content acquisition to content delivery, and everything in between. We’ll look at how Netflix successfully employs a scalable cloud-based data platform to support a constant deluge of data and a small army of data analysts, engineers, and scientists. We’ll discuss the advanced analytical capabilities that are enabled through modern data technologies. Lastly, we’ll explore some of the architectural & operational principals that enable Netflix to so effectively make use of its data.
  2. Obligatory “why should you listen to me talk?” slide
  3. Numbers as of Q3 2016
  4. During CES 2016 this January, ‘flipped the switch’ making Netflix available in 130+ new countries. Netflix is presently available in over 190 countries worldwide.
  5. What content should we license? How much should we bid? How should we value exclusivity? How should we measure content performance?
  6. Originals content: 2015 - 450 hours 2016 - 600 hours 2017 - 1000 hours
  7. Netflix website: circa 2012
  8. Netflix website: circa 2013
  9. Netflix website: circa 2014
  10. Netflix website: circa 2015
  11. Netflix website: circa 2016
  12. Vizceral Open-Source Project: https://github.com/netflix/vizceral http://techblog.netflix.com/2016/08/vizceral-open-source.html http://techblog.netflix.com/2015/10/flux-new-approach-to-system-intuition.html
  13. Genie – federated job execution engine Metacat – federated metadata service Kragle – python APIs
  14. 15m views on SlideShare
  15. Minimize rules Make smart choices Take ownership
  16. Avoid prescriptive requirements Give visibility
  17. Set context (strategy, goals) Communicate only as much as needed
  18. At Netflix, we use the scientific method We’re often right at predicting behavior – for people exactly like us Most people aren’t like us