SlideShare une entreprise Scribd logo
1  sur  33
@atseitlin
Netflix Cloud Platform
Netflix's evolution in the cloud
Ariel Tseitlin
http://www.linkedin.com/in/atseitlin
@atseitlin
@atseitlin
About Netflix
Netflix is the world’s
leading Internet
television network with
nearly 38 million
members in 40
countries enjoying more
than one billion hours
of TV shows and movies
per month, including
original series[1]
[1] http://ir.netflix.com/
@atseitlin
Original Content
@atseitlin
Critical Acclaim
@atseitlin
A complex distributed system
@atseitlin
How Netflix Streaming Works
Customer Device
(PC, PS3, TV…)
Web Site or
Discovery API
User Data
Personalization
Streaming API
DRM
QoS Logging
OpenConnect
CDN Boxes
CDN
Management and
Steering
Content Encoding
Consumer
Electronics
AWS Cloud
Services
CDN Edge
Locations
Browse
Play
Watch
@atseitlin
Highly Available Architecture
Micro-services, redundancy,
resiliency
@atseitlin
Web Server Dependencies Flow
Home page business transaction
Start Here
memcached
Cassandra
Web service
S3 bucket
Personalization movie
group chooser
Each icon is
three to a few
hundred
instances
across three
AWS zones
@atseitlin
Component Micro-Services
Test With Chaos Monkey, Latency Monkey
@atseitlin
Three Balanced Availability Zones
Test with Chaos Gorilla
Cassandra and Evcache
Replicas
Zone A
Cassandra and Evcache
Replicas
Zone B
Cassandra and Evcache
Replicas
Zone C
Load Balancers
@atseitlin
Triple Replicated Persistence
Cassandra maintenance affects individual replicas
Cassandra and Evcache
Replicas
Zone A
Cassandra and Evcache
Replicas
Zone B
Cassandra and Evcache
Replicas
Zone C
Load Balancers
@atseitlin
Isolated Regions
Will someday test with Chaos Kong
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
EU-West Load Balancers
@atseitlin
Failure Modes and Effects
Failure Mode Probability Current Mitigation Plan
Application Failure High Automatic degraded response
AWS Region Failure Low Wait for region to recover
AWS Zone Failure Medium Continue to run on 2 out of 3 zones
Datacenter Failure Medium Migrate more functions to cloud
Data store failure Low Restore from S3 backups
S3 failure Low Restore from remote archive
Until we got really good at mitigating high and medium
probability failures, the ROI for mitigating regional
failures didn’t make sense. Getting there…
@atseitlin
Application Resilience
Run what you wrote
Rapid detection
Rapid Response
Fail often
@atseitlin
Run What You Wrote
• Make developers responsible for failures
– Then they learn and write code that doesn’t fail
• Use Incident Reviews to find gaps to fix
– Make sure its not about finding “who to blame”
• Keep timeouts short, fail fast
– Don’t let cascading timeouts stack up
@atseitlin
Rapid Detection
• If your pilot had no instument panel, would
you ever board fly on a plane?
– Never run your service blind
• Monitor services, not instances
– Make instance failure a non-event
• Don’t pay people to watch screens
– Instead pay them to build alerting
@atseitlin
Rapid Rollback
• Use a new Autoscale Group to push code
• Leave existing ASG in place, switch traffic
• If OK, auto-delete old ASG a few hours later
• If “whoops”, switch traffic back in seconds
@atseitlin
Asgard
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
@atseitlin
Made possible in the cloud
APIs, Elasticity, Efficiency
@atseitlin
APIs
• Control everything (start, terminate, scale)
• Inject failure
• Monitor & audit
• Automate operations
@atseitlin
Elasticity
• Capacity planning replaced with forecasting
• Dynamic load-based auto-scaling
• New data centers at the click of a button
@atseitlin
Efficiency
• ~10x trough to peak ratio. Fill trough with
batch workloads
• Optimize machine class for each service
• Highly available red/black deployments
@atseitlin
Coming soon to a cloud near you
Billing & Payments, Big Data &
Analytics, SaaS
@atseitlin
Billing & Payments
• PCI compliance
• Privacy & security
• Intermediate step of cache in the cloud
@atseitlin
Big Data & Analytics
• On deck for cloud migration
• ETL already in cloud with EMR (Hadoop)
• Many cloud alternatives but not yet as mature
as the old guard
@atseitlin
Corporate system moving to SaaS
• Email (Exchange->Google Apps)
• Expense Management (Concur->Workday)
• Document sharing (File Servers->Box)
• Goal is 100% SaaS
@atseitlin
@atseitlin
Open Source Projects
Github / Techblog
Apache Contributions
Techblog Post
Coming Soon
Priam
Cassandra as a Service
Astyanax
Cassandra client for Java
CassJMeter
Cassandra test suite
Cassandra
Multi-region EC2 datastore
support
Aegisthus
Hadoop ETL for Cassandra
Ice
Spend analytics
Governator
Library lifecycle and dependency
injection
Odin
Cloud orchestration
Blitz4j Async logging
Exhibitor
Zookeeper as a Service
Curator
Zookeeper Patterns
EVCache
Memcached as a Service
Eureka / Discovery
Service Directory
Archaius
Dynamics Properties Service
Edda
Config state with history
Denominator
Ribbon
REST Client + mid-tier LB
Karyon
Instrumented REST Base Serve
Servo and Autoscaling Scripts
Genie
Hadoop PaaS
Hystrix
Robust service pattern
RxJava Reactive Patterns
Asgard
AutoScaleGroup based AWS
console
Chaos Monkey
Robustness verification
Latency Monkey
Janitor Monkey
Bakeries / Aminotor
Legend
@atseitlin
@atseitlin
Our Current Catalog of Releases
Free code available at http://netflix.github.com
@atseitlin
We’re hiring!
• Simian Army
• Cloud Tools
• NetflixOSS
• Cloud Operations
• Reliability Engineering
• Many, many more
jobs.netflix.com
@atseitlin
Takeaways
Netflix has built and deployed a scalable global and highly available Platform as a
Service and opened sourced it (NetflixOSS)
The Cloud enables elasticity, efficiency and fine-grained control via APIs
Credit cards, Big Data, and rest of corporate systems are next to move to the Cloud
http://netflix.github.com
http://techblog.netflix.com
http://slideshare.net/Netflix
http://www.linkedin.com/in/atseitlin
@atseitlin @NetflixOSS
@atseitlin
Thank you!
Any questions?
Ariel Tseitlin
http://www.linkedin.com/in/atseitlin
@atseitlin

Contenu connexe

Tendances

I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...Gareth Bowles
 
Security as Code
Security as CodeSecurity as Code
Security as CodeEd Bellis
 
Chaos Engineering when you're not Netflix
Chaos Engineering when you're not NetflixChaos Engineering when you're not Netflix
Chaos Engineering when you're not NetflixMartez Reed
 
Chaos Engineering - Limiting Damage During Chaos Experiments
Chaos Engineering - Limiting Damage During Chaos ExperimentsChaos Engineering - Limiting Damage During Chaos Experiments
Chaos Engineering - Limiting Damage During Chaos ExperimentsNils Meder
 
Intro to OpenStack - Scott Sanchez and Niki Acosta
Intro to OpenStack - Scott Sanchez and Niki AcostaIntro to OpenStack - Scott Sanchez and Niki Acosta
Intro to OpenStack - Scott Sanchez and Niki AcostaScott Sanchez
 
From Concept to Clustered JAC (jira.atlassian.com) - Graham Carrick
From Concept to Clustered JAC (jira.atlassian.com) - Graham CarrickFrom Concept to Clustered JAC (jira.atlassian.com) - Graham Carrick
From Concept to Clustered JAC (jira.atlassian.com) - Graham CarrickAtlassian
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedSalman Ahmed
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
 
Vertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraVertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraDataStax Academy
 
Mobile User Experience: Auto Drive through Performance Metrics
Mobile User Experience:Auto Drive through Performance MetricsMobile User Experience:Auto Drive through Performance Metrics
Mobile User Experience: Auto Drive through Performance MetricsAndreas Grabner
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSBilal Aybar
 
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...Aaron Rinehart
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!Andreas Grabner
 
Slam Dunk with Splunk and Stash Data Center
Slam Dunk with Splunk and Stash Data CenterSlam Dunk with Splunk and Stash Data Center
Slam Dunk with Splunk and Stash Data CenterAtlassian
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Jeremy Edberg
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...mircodotta
 
Staying Secure When Moving to the Cloud - Dave Millier
Staying Secure When Moving to the Cloud - Dave MillierStaying Secure When Moving to the Cloud - Dave Millier
Staying Secure When Moving to the Cloud - Dave MillierTriNimbus
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksAppDynamics
 

Tendances (20)

I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...
 
Security as Code
Security as CodeSecurity as Code
Security as Code
 
Chaos Engineering when you're not Netflix
Chaos Engineering when you're not NetflixChaos Engineering when you're not Netflix
Chaos Engineering when you're not Netflix
 
Chaos Engineering - Limiting Damage During Chaos Experiments
Chaos Engineering - Limiting Damage During Chaos ExperimentsChaos Engineering - Limiting Damage During Chaos Experiments
Chaos Engineering - Limiting Damage During Chaos Experiments
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Intro to OpenStack - Scott Sanchez and Niki Acosta
Intro to OpenStack - Scott Sanchez and Niki AcostaIntro to OpenStack - Scott Sanchez and Niki Acosta
Intro to OpenStack - Scott Sanchez and Niki Acosta
 
From Concept to Clustered JAC (jira.atlassian.com) - Graham Carrick
From Concept to Clustered JAC (jira.atlassian.com) - Graham CarrickFrom Concept to Clustered JAC (jira.atlassian.com) - Graham Carrick
From Concept to Clustered JAC (jira.atlassian.com) - Graham Carrick
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmed
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 
Vertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache CassandraVertafore: Database Evaluation - Selecting Apache Cassandra
Vertafore: Database Evaluation - Selecting Apache Cassandra
 
Mobile User Experience: Auto Drive through Performance Metrics
Mobile User Experience:Auto Drive through Performance MetricsMobile User Experience:Auto Drive through Performance Metrics
Mobile User Experience: Auto Drive through Performance Metrics
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWS
 
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
 
Slam Dunk with Splunk and Stash Data Center
Slam Dunk with Splunk and Stash Data CenterSlam Dunk with Splunk and Stash Data Center
Slam Dunk with Splunk and Stash Data Center
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems (Soft-Sha...
 
(R)evolutionize APM
(R)evolutionize APM(R)evolutionize APM
(R)evolutionize APM
 
Staying Secure When Moving to the Cloud - Dave Millier
Staying Secure When Moving to the Cloud - Dave MillierStaying Secure When Moving to the Cloud - Dave Millier
Staying Secure When Moving to the Cloud - Dave Millier
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 

En vedette

W D N A L C A T E L
W D N  A L C A T E LW D N  A L C A T E L
W D N A L C A T E LFxx
 
A jelszó hasznalatáról háziasszonyoknak
A jelszó hasznalatáról háziasszonyoknakA jelszó hasznalatáról háziasszonyoknak
A jelszó hasznalatáról háziasszonyoknakpappet
 
Rules Of Procedure
Rules Of ProcedureRules Of Procedure
Rules Of ProcedurePeitung Wang
 
11 Myths Pinoys Have About Making Money
11 Myths Pinoys Have About Making Money11 Myths Pinoys Have About Making Money
11 Myths Pinoys Have About Making MoneyHannah Almira Amora
 
Newspapers Photoshoot
Newspapers PhotoshootNewspapers Photoshoot
Newspapers Photoshootstefaniegomez
 
Futrinka utca Egyesület - 2. verzió
Futrinka utca Egyesület - 2. verzióFutrinka utca Egyesület - 2. verzió
Futrinka utca Egyesület - 2. verzióguestfc9287
 
Crowdfunding voor winkelstraat cooperatie Jan Eef
Crowdfunding voor winkelstraat cooperatie Jan Eef Crowdfunding voor winkelstraat cooperatie Jan Eef
Crowdfunding voor winkelstraat cooperatie Jan Eef Ronald Kleverlaan
 
Django Girls 2015 - CSS
Django Girls 2015 - CSSDjango Girls 2015 - CSS
Django Girls 2015 - CSSHsuan Fu Lien
 
Envisioning Augmented Reality: Smart Technology for the Future
Envisioning Augmented Reality: Smart Technology for the FutureEnvisioning Augmented Reality: Smart Technology for the Future
Envisioning Augmented Reality: Smart Technology for the FutureDr Poonsri Vate-U-Lan
 
C O N E C T A R L A P T O P N A R E D E T E L E M A R
C O N E C T A R  L A P  T O P  N A  R E D E  T E L E M A RC O N E C T A R  L A P  T O P  N A  R E D E  T E L E M A R
C O N E C T A R L A P T O P N A R E D E T E L E M A RFxx
 
Who Were The Colonists Video United Streaming
Who Were The Colonists Video United StreamingWho Were The Colonists Video United Streaming
Who Were The Colonists Video United Streamingawltech
 
The finanal project
The finanal projectThe finanal project
The finanal projectPeitung Wang
 
Crowdfunding strategie sessie IVN
Crowdfunding strategie sessie IVNCrowdfunding strategie sessie IVN
Crowdfunding strategie sessie IVNRonald Kleverlaan
 
A Dynamic Voting Wiki Model Cw 8 3a
A Dynamic Voting Wiki Model Cw 8 3aA Dynamic Voting Wiki Model Cw 8 3a
A Dynamic Voting Wiki Model Cw 8 3aConnie White
 
Crowdfunding - A new way of financing SMEs in Colombia - #ForosMipyme
Crowdfunding  - A new way of financing SMEs in Colombia - #ForosMipymeCrowdfunding  - A new way of financing SMEs in Colombia - #ForosMipyme
Crowdfunding - A new way of financing SMEs in Colombia - #ForosMipymeRonald Kleverlaan
 
Career 101 - Career Tips They Didn't Teach You in College
Career 101 - Career Tips They Didn't Teach You in CollegeCareer 101 - Career Tips They Didn't Teach You in College
Career 101 - Career Tips They Didn't Teach You in CollegeHannah Almira Amora
 

En vedette (20)

W D N A L C A T E L
W D N  A L C A T E LW D N  A L C A T E L
W D N A L C A T E L
 
A jelszó hasznalatáról háziasszonyoknak
A jelszó hasznalatáról háziasszonyoknakA jelszó hasznalatáról háziasszonyoknak
A jelszó hasznalatáról háziasszonyoknak
 
PhD_eLearning_stylebook_Jan_2013
PhD_eLearning_stylebook_Jan_2013PhD_eLearning_stylebook_Jan_2013
PhD_eLearning_stylebook_Jan_2013
 
Rules Of Procedure
Rules Of ProcedureRules Of Procedure
Rules Of Procedure
 
11 Myths Pinoys Have About Making Money
11 Myths Pinoys Have About Making Money11 Myths Pinoys Have About Making Money
11 Myths Pinoys Have About Making Money
 
Newspapers Photoshoot
Newspapers PhotoshootNewspapers Photoshoot
Newspapers Photoshoot
 
Futrinka utca Egyesület - 2. verzió
Futrinka utca Egyesület - 2. verzióFutrinka utca Egyesület - 2. verzió
Futrinka utca Egyesület - 2. verzió
 
wtfux?
wtfux?wtfux?
wtfux?
 
Crowdfunding voor winkelstraat cooperatie Jan Eef
Crowdfunding voor winkelstraat cooperatie Jan Eef Crowdfunding voor winkelstraat cooperatie Jan Eef
Crowdfunding voor winkelstraat cooperatie Jan Eef
 
Django Girls 2015 - CSS
Django Girls 2015 - CSSDjango Girls 2015 - CSS
Django Girls 2015 - CSS
 
Envisioning Augmented Reality: Smart Technology for the Future
Envisioning Augmented Reality: Smart Technology for the FutureEnvisioning Augmented Reality: Smart Technology for the Future
Envisioning Augmented Reality: Smart Technology for the Future
 
C O N E C T A R L A P T O P N A R E D E T E L E M A R
C O N E C T A R  L A P  T O P  N A  R E D E  T E L E M A RC O N E C T A R  L A P  T O P  N A  R E D E  T E L E M A R
C O N E C T A R L A P T O P N A R E D E T E L E M A R
 
Who Were The Colonists Video United Streaming
Who Were The Colonists Video United StreamingWho Were The Colonists Video United Streaming
Who Were The Colonists Video United Streaming
 
El Aguila
El AguilaEl Aguila
El Aguila
 
The finanal project
The finanal projectThe finanal project
The finanal project
 
Crowdfunding strategie sessie IVN
Crowdfunding strategie sessie IVNCrowdfunding strategie sessie IVN
Crowdfunding strategie sessie IVN
 
A Dynamic Voting Wiki Model Cw 8 3a
A Dynamic Voting Wiki Model Cw 8 3aA Dynamic Voting Wiki Model Cw 8 3a
A Dynamic Voting Wiki Model Cw 8 3a
 
Crowdfunding - A new way of financing SMEs in Colombia - #ForosMipyme
Crowdfunding  - A new way of financing SMEs in Colombia - #ForosMipymeCrowdfunding  - A new way of financing SMEs in Colombia - #ForosMipyme
Crowdfunding - A new way of financing SMEs in Colombia - #ForosMipyme
 
Career 101 - Career Tips They Didn't Teach You in College
Career 101 - Career Tips They Didn't Teach You in CollegeCareer 101 - Career Tips They Didn't Teach You in College
Career 101 - Career Tips They Didn't Teach You in College
 
MGIMO, Moscow - Second lecture, 29/11/10
MGIMO, Moscow - Second lecture, 29/11/10MGIMO, Moscow - Second lecture, 29/11/10
MGIMO, Moscow - Second lecture, 29/11/10
 

Similaire à MassTLC Cloud Summit Keynote

Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013MassTLC
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAmazon Web Services
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)Amazon Web Services
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformSudhir Tonse
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Amazon Web Services
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Introducing to serverless computing and AWS lambda - Israel Clouds Meetup
Introducing to serverless computing and AWS lambda - Israel Clouds MeetupIntroducing to serverless computing and AWS lambda - Israel Clouds Meetup
Introducing to serverless computing and AWS lambda - Israel Clouds MeetupBoaz Ziniman
 
From AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesFrom AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesAmazon Web Services
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
Building scalable OTT workflows on AWS - Serverless Video Workflows
Building scalable OTT workflows on AWS - Serverless Video WorkflowsBuilding scalable OTT workflows on AWS - Serverless Video Workflows
Building scalable OTT workflows on AWS - Serverless Video WorkflowsAmazon Web Services
 
Microservices and serverless for MegaStartups - DLD TLV 2017
Microservices and serverless for MegaStartups - DLD TLV 2017Microservices and serverless for MegaStartups - DLD TLV 2017
Microservices and serverless for MegaStartups - DLD TLV 2017Boaz Ziniman
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Cloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersCloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersEitan Sela
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesAmazon Web Services
 
Netflix0SS Services on Docker
Netflix0SS Services on DockerNetflix0SS Services on Docker
Netflix0SS Services on DockerDocker, Inc.
 

Similaire à MassTLC Cloud Summit Keynote (20)

Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud Platform
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Introducing to serverless computing and AWS lambda - Israel Clouds Meetup
Introducing to serverless computing and AWS lambda - Israel Clouds MeetupIntroducing to serverless computing and AWS lambda - Israel Clouds Meetup
Introducing to serverless computing and AWS lambda - Israel Clouds Meetup
 
From AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesFrom AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy Pieces
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
Building scalable OTT workflows on AWS - Serverless Video Workflows
Building scalable OTT workflows on AWS - Serverless Video WorkflowsBuilding scalable OTT workflows on AWS - Serverless Video Workflows
Building scalable OTT workflows on AWS - Serverless Video Workflows
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Microservices and serverless for MegaStartups - DLD TLV 2017
Microservices and serverless for MegaStartups - DLD TLV 2017Microservices and serverless for MegaStartups - DLD TLV 2017
Microservices and serverless for MegaStartups - DLD TLV 2017
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Cloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersCloud & Native Cloud for Managers
Cloud & Native Cloud for Managers
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Netflix0SS Services on Docker
Netflix0SS Services on DockerNetflix0SS Services on Docker
Netflix0SS Services on Docker
 

Dernier

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

MassTLC Cloud Summit Keynote

  • 1. @atseitlin Netflix Cloud Platform Netflix's evolution in the cloud Ariel Tseitlin http://www.linkedin.com/in/atseitlin @atseitlin
  • 2. @atseitlin About Netflix Netflix is the world’s leading Internet television network with nearly 38 million members in 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series[1] [1] http://ir.netflix.com/
  • 6. @atseitlin How Netflix Streaming Works Customer Device (PC, PS3, TV…) Web Site or Discovery API User Data Personalization Streaming API DRM QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding Consumer Electronics AWS Cloud Services CDN Edge Locations Browse Play Watch
  • 8. @atseitlin Web Server Dependencies Flow Home page business transaction Start Here memcached Cassandra Web service S3 bucket Personalization movie group chooser Each icon is three to a few hundred instances across three AWS zones
  • 9. @atseitlin Component Micro-Services Test With Chaos Monkey, Latency Monkey
  • 10. @atseitlin Three Balanced Availability Zones Test with Chaos Gorilla Cassandra and Evcache Replicas Zone A Cassandra and Evcache Replicas Zone B Cassandra and Evcache Replicas Zone C Load Balancers
  • 11. @atseitlin Triple Replicated Persistence Cassandra maintenance affects individual replicas Cassandra and Evcache Replicas Zone A Cassandra and Evcache Replicas Zone B Cassandra and Evcache Replicas Zone C Load Balancers
  • 12. @atseitlin Isolated Regions Will someday test with Chaos Kong Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C US-East Load Balancers Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C EU-West Load Balancers
  • 13. @atseitlin Failure Modes and Effects Failure Mode Probability Current Mitigation Plan Application Failure High Automatic degraded response AWS Region Failure Low Wait for region to recover AWS Zone Failure Medium Continue to run on 2 out of 3 zones Datacenter Failure Medium Migrate more functions to cloud Data store failure Low Restore from S3 backups S3 failure Low Restore from remote archive Until we got really good at mitigating high and medium probability failures, the ROI for mitigating regional failures didn’t make sense. Getting there…
  • 14. @atseitlin Application Resilience Run what you wrote Rapid detection Rapid Response Fail often
  • 15. @atseitlin Run What You Wrote • Make developers responsible for failures – Then they learn and write code that doesn’t fail • Use Incident Reviews to find gaps to fix – Make sure its not about finding “who to blame” • Keep timeouts short, fail fast – Don’t let cascading timeouts stack up
  • 16. @atseitlin Rapid Detection • If your pilot had no instument panel, would you ever board fly on a plane? – Never run your service blind • Monitor services, not instances – Make instance failure a non-event • Don’t pay people to watch screens – Instead pay them to build alerting
  • 17. @atseitlin Rapid Rollback • Use a new Autoscale Group to push code • Leave existing ASG in place, switch traffic • If OK, auto-delete old ASG a few hours later • If “whoops”, switch traffic back in seconds
  • 19. @atseitlin Made possible in the cloud APIs, Elasticity, Efficiency
  • 20. @atseitlin APIs • Control everything (start, terminate, scale) • Inject failure • Monitor & audit • Automate operations
  • 21. @atseitlin Elasticity • Capacity planning replaced with forecasting • Dynamic load-based auto-scaling • New data centers at the click of a button
  • 22. @atseitlin Efficiency • ~10x trough to peak ratio. Fill trough with batch workloads • Optimize machine class for each service • Highly available red/black deployments
  • 23. @atseitlin Coming soon to a cloud near you Billing & Payments, Big Data & Analytics, SaaS
  • 24. @atseitlin Billing & Payments • PCI compliance • Privacy & security • Intermediate step of cache in the cloud
  • 25. @atseitlin Big Data & Analytics • On deck for cloud migration • ETL already in cloud with EMR (Hadoop) • Many cloud alternatives but not yet as mature as the old guard
  • 26. @atseitlin Corporate system moving to SaaS • Email (Exchange->Google Apps) • Expense Management (Concur->Workday) • Document sharing (File Servers->Box) • Goal is 100% SaaS
  • 28. @atseitlin Open Source Projects Github / Techblog Apache Contributions Techblog Post Coming Soon Priam Cassandra as a Service Astyanax Cassandra client for Java CassJMeter Cassandra test suite Cassandra Multi-region EC2 datastore support Aegisthus Hadoop ETL for Cassandra Ice Spend analytics Governator Library lifecycle and dependency injection Odin Cloud orchestration Blitz4j Async logging Exhibitor Zookeeper as a Service Curator Zookeeper Patterns EVCache Memcached as a Service Eureka / Discovery Service Directory Archaius Dynamics Properties Service Edda Config state with history Denominator Ribbon REST Client + mid-tier LB Karyon Instrumented REST Base Serve Servo and Autoscaling Scripts Genie Hadoop PaaS Hystrix Robust service pattern RxJava Reactive Patterns Asgard AutoScaleGroup based AWS console Chaos Monkey Robustness verification Latency Monkey Janitor Monkey Bakeries / Aminotor Legend
  • 30. @atseitlin Our Current Catalog of Releases Free code available at http://netflix.github.com
  • 31. @atseitlin We’re hiring! • Simian Army • Cloud Tools • NetflixOSS • Cloud Operations • Reliability Engineering • Many, many more jobs.netflix.com
  • 32. @atseitlin Takeaways Netflix has built and deployed a scalable global and highly available Platform as a Service and opened sourced it (NetflixOSS) The Cloud enables elasticity, efficiency and fine-grained control via APIs Credit cards, Big Data, and rest of corporate systems are next to move to the Cloud http://netflix.github.com http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/atseitlin @atseitlin @NetflixOSS
  • 33. @atseitlin Thank you! Any questions? Ariel Tseitlin http://www.linkedin.com/in/atseitlin @atseitlin