SlideShare une entreprise Scribd logo
1  sur  34
1
Genie – Hadoop Platform as a Service at Netflix
Sriram Krishnan
Hadoop Summit, June 26, 2013
Netflix does Hadoop
Netflix does Hadoop at scale
Netflix does Hadoop at scale*
Netflix does Hadoop at scale in the cloud
S3 as the Cloud Data Warehouse
Cloud Data Warehouse
Multiple Hadoop Clusters
Cloud Data Warehouse
Hadoop (EMR) Clusters
Data Platform as a Service
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Large Ecosystem of Clients & Tools
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Why Genie?
 Simple API for job submission and management
 Accessible from the data center and the cloud
 Abstraction of physical details of back-end
Hadoop clusters
What Genie is Not
 A workflow scheduler, such as Oozie
 A task scheduler, such as fair share or capacity
schedulers
 An end-to-end resource management tool
Genie: Job Execution
 API to run Hadoop, Hive and Pig
jobs
 Auto-magic submission of jobs
to the right Hadoop cluster
 Abstracting away cluster details
from clients
Genie: Resource Configuration
 API for management of cluster
metadata
 Status: up, out of service, or
terminated
 Site-specific Hadoop, Hive and
Pig configurations
 Cluster naming/tagging for job
submissions
Eureka ServiceEureka Service
Registers
service
ClientEureka
Client
Ribbon
Discovers
service
Invokes
(submits job)
Launches
job
Discovers
service
Client Eureka
Client
Python API
Launches
cluster(s)
Registers
cluster
End-users
Admins
Netflix OSS
http://netflix.github.com
Karyon
Eureka
Client
Ribbon
Servo
Hadoop
Hive
Pig
Karyon
Archaius
Ribbon
Servo
Hadoop
Hive
Pig
Eureka
Client
Genie: Job Execution
• Job Type: {hadoop, hive, pig}
• File dependencies (script, udfs, etc)
• Command-line arguments
• Schedule: {adhoc, sla}
• Configuration: {prod, test, unittest}
REST call
Genie: Job Execution
* Used to query status, get outputs, kill job
Response: job ID*
Genie Job Details
Job ID
Script to execute
Standard output and error
Pig logs
Job conf directory
Genie – Use Cases Enabled at Netflix
 Running nightly short-lived “bonus” clusters to
augment ETL processing
 Re-routing traffic between clusters
 “Red/black” pushes for clusters
 Attaching stand-alone gateways to clusters
 Running 100% of all SLA jobs, and a high
percentage of ad-hoc jobs
Nightly Short-lived Bonus Clusters
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Execution Service Configuration Service
{Schedule=bonus,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc, sla
Configurations: prod, test
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Genie Usage at Netflix
 Usage statistics brought to you by “Sherlock”
 Pig job to gather Hadoop job statistics
 Tableau-based visualization
Genie Deployment in the Cloud
 Asgard is also part of Netflix OSS
 https://github.com/Netflix/asgard
Auto Scaling in the Cloud
Genie is now part of Netflix OSS!
 http://techblog.netflix.com/2013/06/genie-is-out-
of-bottle.html
 Clone it on GitHub at:
 https://github.com/Netflix/genie
 Still “version 0” – work in progress!
 All contributions and feedback welcome!
 Come talk to us and check out live demos at the
Netflix Booth
Watching Pigs Fly with the
Netflix Hadoop Toolkit
 Sriram Krishnan
We’re hiring!
Thank you!
Home: http://www.netflix.com
Jobs: http://jobs.netflix.com
Tech Blog: http://techblog.netflix.com/

Contenu connexe

En vedette

sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...JYOTI DEVENDRA
 
Cip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentationCip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentationAraik Ambartsumyan
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Copenhagenomics
 
Analyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation SequencingAnalyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation SequencingQIAGEN
 
Different methods of gene sequencing durgesh sirohi
Different methods of  gene sequencing   durgesh sirohiDifferent methods of  gene sequencing   durgesh sirohi
Different methods of gene sequencing durgesh sirohiD. Sirohi
 
Protein synthesis with turning point
Protein synthesis with turning pointProtein synthesis with turning point
Protein synthesis with turning pointtas11244
 

En vedette (17)

Sterilization methods of parenterals
Sterilization methods of parenteralsSterilization methods of parenterals
Sterilization methods of parenterals
 
Fermenter and their oprations
Fermenter and their oprationsFermenter and their oprations
Fermenter and their oprations
 
Fermentation
FermentationFermentation
Fermentation
 
Hoofdstuk 20 2008 deel 3
Hoofdstuk 20 2008 deel 3Hoofdstuk 20 2008 deel 3
Hoofdstuk 20 2008 deel 3
 
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
 
Media Sterilisation
Media SterilisationMedia Sterilisation
Media Sterilisation
 
Steralization
SteralizationSteralization
Steralization
 
Development of media
Development of mediaDevelopment of media
Development of media
 
Cip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentationCip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentation
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
 
Analyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation SequencingAnalyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation Sequencing
 
Fermentor
Fermentor   Fermentor
Fermentor
 
Purification product
Purification product Purification product
Purification product
 
Different methods of gene sequencing durgesh sirohi
Different methods of  gene sequencing   durgesh sirohiDifferent methods of  gene sequencing   durgesh sirohi
Different methods of gene sequencing durgesh sirohi
 
Fermentation technology
Fermentation technology Fermentation technology
Fermentation technology
 
Genes
GenesGenes
Genes
 
Protein synthesis with turning point
Protein synthesis with turning pointProtein synthesis with turning point
Protein synthesis with turning point
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 

Dernier (20)

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 

Genie - Hadoop Platform as a Service at Netflix

  • 1. 1 Genie – Hadoop Platform as a Service at Netflix Sriram Krishnan Hadoop Summit, June 26, 2013
  • 5. Netflix does Hadoop at scale in the cloud
  • 6. S3 as the Cloud Data Warehouse Cloud Data Warehouse
  • 7. Multiple Hadoop Clusters Cloud Data Warehouse Hadoop (EMR) Clusters
  • 8. Data Platform as a Service Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 9. Large Ecosystem of Clients & Tools Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 10. Why Genie?  Simple API for job submission and management  Accessible from the data center and the cloud  Abstraction of physical details of back-end Hadoop clusters
  • 11. What Genie is Not  A workflow scheduler, such as Oozie  A task scheduler, such as fair share or capacity schedulers  An end-to-end resource management tool
  • 12. Genie: Job Execution  API to run Hadoop, Hive and Pig jobs  Auto-magic submission of jobs to the right Hadoop cluster  Abstracting away cluster details from clients
  • 13. Genie: Resource Configuration  API for management of cluster metadata  Status: up, out of service, or terminated  Site-specific Hadoop, Hive and Pig configurations  Cluster naming/tagging for job submissions
  • 14. Eureka ServiceEureka Service Registers service ClientEureka Client Ribbon Discovers service Invokes (submits job) Launches job Discovers service Client Eureka Client Python API Launches cluster(s) Registers cluster End-users Admins Netflix OSS http://netflix.github.com Karyon Eureka Client Ribbon Servo Hadoop Hive Pig Karyon Archaius Ribbon Servo Hadoop Hive Pig Eureka Client
  • 15. Genie: Job Execution • Job Type: {hadoop, hive, pig} • File dependencies (script, udfs, etc) • Command-line arguments • Schedule: {adhoc, sla} • Configuration: {prod, test, unittest} REST call
  • 16. Genie: Job Execution * Used to query status, get outputs, kill job Response: job ID*
  • 17. Genie Job Details Job ID Script to execute Standard output and error Pig logs Job conf directory
  • 18. Genie – Use Cases Enabled at Netflix  Running nightly short-lived “bonus” clusters to augment ETL processing  Re-routing traffic between clusters  “Red/black” pushes for clusters  Attaching stand-alone gateways to clusters  Running 100% of all SLA jobs, and a high percentage of ad-hoc jobs
  • 19. Nightly Short-lived Bonus Clusters Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod
  • 20. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Execution Service Configuration Service {Schedule=bonus, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod
  • 21. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 22. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: TERMINATED Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 23. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 24. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc, sla Configurations: prod, test Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE
  • 25. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 26. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 27. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 28. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: TERMINATED Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 29. Genie Usage at Netflix  Usage statistics brought to you by “Sherlock”  Pig job to gather Hadoop job statistics  Tableau-based visualization
  • 30. Genie Deployment in the Cloud  Asgard is also part of Netflix OSS  https://github.com/Netflix/asgard
  • 31. Auto Scaling in the Cloud
  • 32. Genie is now part of Netflix OSS!  http://techblog.netflix.com/2013/06/genie-is-out- of-bottle.html  Clone it on GitHub at:  https://github.com/Netflix/genie  Still “version 0” – work in progress!  All contributions and feedback welcome!  Come talk to us and check out live demos at the Netflix Booth
  • 33. Watching Pigs Fly with the Netflix Hadoop Toolkit
  • 34.  Sriram Krishnan We’re hiring! Thank you! Home: http://www.netflix.com Jobs: http://jobs.netflix.com Tech Blog: http://techblog.netflix.com/

Notes de l'éditeur

  1. Reference tech blogs: http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.htmlUse cases – reporting, analytics, insights, algorithms (e.g. recommendations)But big deal – so does everyone in the room
  2. What is scale? It means different things to different people
  3. 80-100 billion events per day, 10s of TB of data (compressed)Totals ~2PB (retention is a few months)Many clusters – 2000-2500 nodes at different times during the dayAgain, big deal – there are many others in the room who do Hadoop at this scale (petabyte is the new terabyte)
  4. Our Hadoop processing is 100% in the (public) cloudIn our case, public cloud is AWSThis is what differentiates our infrastructure from the restHadoop in the cloud is different from Hadoop in the datacenter – in this talk, we will discuss our cloud-based Hadoop platformWe made certain architectural choices to make it easy for our end-users to run Hadoop jobs, and for us to manage Hadoop resources
  5. S3 is the source of truthS3 benefitsHighly durable and available – 11 9’sBucket versioningHighly elastic - we grew our data warehouse organically from a few hundred terabytes to petabytes without having to provision any storage resources in advanceHDFS? Only for transient data, intermediate results for multi-stage jobsS3 cons – performance, eventual consistency
  6. Another benefit of S3 - Multiple clusters can read/process the same data(Semi-) persistent sla and ad-hoc clusters:~800-1300 nodesMultiple ad-hoc clusters to A/B test new releases/featuresNightly "bonus" clusters to supplement SLA clusterOperation assumption – clusters may go down at any timeIf we lose a cluster, we just respin itClusters are inter-changeable: Decoupling of storage from the computational infrastructure
  7. All end-users want to do is run jobs, and access their dataAs the platform team, our goal is to shield them from the back-end complexityGenieREST API for job execution/monitoringRepository/abstraction for clusters and metastoresFranklin – MDSUses HiveServerto talk to Hive metastoreIn all honesty – very few people use this API directly
  8. Next – we will focus on Genie for the rest of the talkOther tools will be talked about in the other Netflix talk – Watching Pigs Fly with the Netflix Hadoop ToolkitThu, 1:40PM
  9. EMR: HadoopIaaS, and an API to run jobs on transient clusters – our clusters are semi-persistent, and job submissions don’t result in new clusters.Oozie: Workflow tool, which only supports Hadoop ecosystem – we have hybrid jobs (Teradata+Hadoop) being orchestrated by UC4, so we just needed a job submission API. Also no support for Hive when we started.Templeton: No multi-cluster, multi-user support, not quite ready for prime-time.
  10. Genie is a resource “match-maker”Next – we look at two key services that Genie provides
  11. Unit of execution is a Hadoop/Hive/Pig jobUsers provide scripts, dependencies and other metadataDoes no scheduling per se – only does “meta-scheduling” or resource matching
  12. Status defines whether it is accepting jobsConfigurations are *-site.xmls and propertiesCluster name, schedule, etcNext we look at the two classes of users supported by Genie – and overall lifecycle
  13. Two classes of users: admins and end-usersAdmins spin up clusters, set cluster metadataUsers use the clusters once they have been registeredGenie is built on top of Netflix OSS
  14. Genie figures out the resources to run jobs on – back-end resources are abstracted outAsynchronous execution since jobs may be long-running
  15. Every job run as a separate process using Hadoop/Hive/Pig CLIAvoids “jar hell” since it needs Hadoop jarsJobs run in their own sandbox (working directory)Provides isolation between jobs, and between Genie and the jobsStandard output/error of jobs easily availableAble to support multiple versions of Hadoop/Hive/Pig, and connect to multiple clusters
  16. Configuration service helps us do crazy (cool) thingsWill describe each of these in greater detail
  17. New bonus clusters launched each night – but clients are oblivious of actual host names/IP’sOne way to do thisHigher SLA jobs first ask for cluster by name
  18. New bonus clusters launched each night – but clients are oblivious of actual host names/IP’sOne way to do thisHigher SLA jobs first ask for cluster by name
  19. If it doesn’t exist, revert back to existing clusterWhy not just expand?Better isolationMixing matching instance types not ideal for HadoopProd cluster uses m1.xlarges for slave nodesShrink has proven to be a problemWe want to do hard shutdown when those instances are needed on awsprod
  20. If it doesn’t exist, revert back to existing clusterWhy not just expand?Better isolationMixing matching instance types not ideal for HadoopProd cluster uses m1.xlarges for slave nodesShrink has proven to be a problemWe want to do hard shutdown when those instances are needed on awsprod
  21. We had to bounce the prod job tracker to enable priorities for “long-pole” jobsWanted to do it with minimal impact to SLA jobs
  22. Must wait for all existing jobs to finish for minimal impactHadoop jobs are long running – don’t want to kill a 5 hour job nearing its finish
  23. Prod cluster is back up after maintenanceJobs that were scheduled on query cluster will continue to run there until it finishesThis is done from time to time – although not too often, we do red-black pushes…
  24. This is initial state – we need to spin up a new cluster, e.g. to push a new feature
  25. * Spin up new cluster, mark it as UP, mark old cluster as OOS
  26. OUT_OF_SERVICE to TERMINATED
  27. Our techblog shows number of Hadoop jobs – this shows Genie jobsTwo query clusters – A/B testing new fair share schedulerMention that we will be writing a techblog about this soon, with more details
  28. Set up desired instance counts across multiple AZ’sDo “red-black” pushes using “sequential ASGs”Loss of individual nodes will cause jobs running on those nodes to be lost
  29. Auto-scaling policy set up to expand if number of running jobs > ~80%
  30. Still biased towards running in the cloud and at Netflix, but will generalize/improve it based on community feedback
  31. * Come listen to how we enable “Data Platform as a Service” – it is truly Lipstick on a Pig.