SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Netflix
 Founded in 1997
 World's leading internet subscription service for enjoying movies and
TV programs
 More than 48 million members in more than 40 countries enjoying
more than one billion hours of TV shows and movies per month
 Watch it anytime anywhere on all your connected devices
 From 2 to 60 billion requests a day to their api in 2 years
 12 billion outbound requests to api dependencies
 A complex distributed system
2
Amazon Web Services
 AWS
• Officially launched in 2006
• Offers a broad set of global compute, storage, database, analytics,
application, and deployment services
• Accessible by HTTP via REST or SOAP
• Data centers localized in 8 different world regions
• Has Nasa, Netflix and the CIA (AWS private replica) as customers
 What it provides
• Elastic Compute Cloud (EC2), resizable compute capacity in the cloud
• Elastic Block Store (EBS), block level storage volumes used by EC2 instances
• Elastic Load Balancing, automatic incoming application traffic distribution across
multiple EC2 instances
3
Architecture picture
4
B
C
A
ASG 2
B
C
A
ASG 1
Availability Zone 2
Region A B
C
A
ASG 2
B
C
A
ASG 1
Availability Zone 1
B
C
A
ASG 2
B
C
A
ASG 1
Availability Zone 3
Transition to AWS
 Dorothy, you’re not in Kansas anymore
• Prepare to unlearn a lot of what you know
• Be much more structured about “over the wire” interactions
 Co-tenancy is hard
• Build your system to expect and accommodate failure at any level
 The best way to avoid failure is to fail constantly
• Design each distributed system to expect and tolerate failure
from other systems on which it depends
• Constantly test your ability to succeed despite failure
 Learn with real scale, not toy models
• Try doing it at full scale with real data
• Validate your design choices, with real scale comes trouble
 Commit yourself
• It is hard to start
• Learn from your mistakes
5
The Simian Army
 Availability and Resiliency as a Service
 A set of tools (scheduled agents) that deliberately shuts down
services, slows down performances, checks conformity, …
And tests the ability to survive them
• Chaos Monkey * (Chaos Gorilla and Chaos Kong)
• Latency Monkey
• Conformity Monkey *
• Security Monkey
• Doctor Monkey
• Janitor Monkey *
• Howler Monkey
• More to come, …
6
The Chaos Monkey
 How
• Service running on AWS and seeking out Auto Scaling groups and terminating
instances per group
• Flexible enough design to work on other cloud providers or instances grouping
an can be enhanced to support that
• Has a configurable schedule, running by default on non-holiday weekdays
between 9am and 3pm
• Gorilla monkey simulates the outage of an entire Availability Zone
• Kong Monkey simulates the outage of an entire Region
 Why
• Prepare to fail to ensure you can tolerate instance failure
• Learn from new unpredicted issues that may occur
• Check services automatic rebalance without user visual impact
7
The Latency Monkey
 How
• Service running on AWS and inducing artificial delay on the RESTful
communication layer and measuring upstream services response
• With large delays, a node or even an entire service downtime can be simulated
without physically bringing the instances down
 Why
• Simulate service degradation
• Test that services respond appropriately
• Test the ability to survive an entire service downtime
• Test the fault tolerance of a new service by simulating the failure of its
dependencies without affecting the rest of the system
8
The Conformity Monkey
 How
• Service running on AWS and finding instances that don’t comply to predefined
best practices
• Marks non compliant instances, shuts them down and notifies corresponding
owners
• Check is performed every hour by default
• Notification is sent only once per day at noon time
 Why
• Non compliant instances, like not belonging to and Auto Scaling Group are
trouble waiting to happen
• Anticipate and give the owners a chance to relaunch them properly
9
The Security Monkey
 How
• Service running on AWS and an extension of the Conformity Monkey
• Finds security violations or vulnerabilities
 Why
• Track improperly configured security groups to terminate offending instances
• Ensure that all their SSL and DRM certificates are valid and are not coming up
for renewal
10
The Doctor Monkey
 How
• Service running on AWS and taping into health checks running on each instance
• Monitors other external health signs like cpu load or memory usage
• Remove unhealthy detected instances from service
 Why
• Give time to service owners to root cause the problem
• Eventually terminate the detected instances
11
The Janitor Monkey
 How (mark, notify, delete)
• Service running on AWS and searching for unused resources
• Mark resources as cleanup candidates
• Schedule resources disposal time
• Cleanup deadline is defined in the rule that allows to mark the resource
• Notify the owners of the marked resources
• Notification time is 2 business days before the cleanup deadline by default
• During this period the owner can decide to cleanup or retain the resource
• Dispose resources once deadline is met
 Why
• Ensure that the cloud environment is running free of clutter and waste.
• Save costs on operations (The more and the longer you use, the more you pay)
• Free up engineering time, no need to manage unused resources anymore
12
The Howler Monkey
 How
• Service running on AWS and monitoring whether a workload meets
AWS possible limitations and reports it
 Why
• Maintain healthy operations by ensuring that AWS limitations are respected
• Save costs on operations
13
Netflix Open Source Software
 Build your own robust and highly available platform
 Release PaaS components git by git to incentive others
• Sources at github.com/netflix
• Intros and techniques at http://techblog.netflix.com
• Blog post or new code every few weeks
 Motivations
• Give back to Apache licensed OSS community
• Motivate, retain, hire top engineers
• "Peer pressure" code cleanup, external contributions
 Users and contributors
• IBM
• Waze
• Yahoo
• Eucalyptus (Scalable cloud software)
• Yammer (Private social network), …
14
References
 http://ir.netflix.com/
 http://aws.amazon.com/
 https://github.com/Netflix/SimianArmy/wiki
 http://www.zdnet.com/netflix-how-we-got-a-grip-on-awss-cloud-
3040095277/
 http://techblog.netflix.com/2012/07/chaos-monkey-released-into-
wild.html
 http://gigaom.com/2013/07/21/ibm-high-fives-netflix-open-source-
tools/
 http://sssslide.com/www.slideshare.net/adrianco/netflix-and-open-
source
15
Find out more
• On https://techblog.betclicgroup.com/
We want our Sports betting, Poker, Horse
racing and Casino & Games brands to be easy
to use for every gamer around the world.
Code with us to make that happen.
Look at all the challenges we offer HERE
We are hiring !
Check our Employer Page
Follow us on LinkedIn
About Us
• Betclic Everest Group, one of the world leaders in online
gaming, has a unique portfolio comprising various
complementary international brands: Betclic, Everest, Bet-at-
home.com, Expekt, Monte-Carlo Casino…
• Through our brands, Betclic Everest Group places expertise,
technological know-how and security at the heart of our
strategy to deliver an on-line gaming offer attuned to the
passion of our players. We want our brands to be easy to use
for every gamer around the world. We’re building our
company to make that happen.
• Active in 100 countries with more than 12 million customers
worldwide, the Group is committed to promoting secure and
responsible gaming and is a member of several international
professional associations including the EGBA (European
Gaming and Betting Association) and the ESSA (European
Sports Security Association).

Contenu connexe

Tendances

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for DatabricksDatabricks
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Amazon QuickSight First Call Deck
Amazon QuickSight First Call DeckAmazon QuickSight First Call Deck
Amazon QuickSight First Call DeckAmazon Web Services
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka confluent
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...HostedbyConfluent
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
AWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence PillarAWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence PillarJonathan LaCour
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaKai Wähner
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...Databricks
 

Tendances (20)

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
kafka
kafkakafka
kafka
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Amazon QuickSight First Call Deck
Amazon QuickSight First Call DeckAmazon QuickSight First Call Deck
Amazon QuickSight First Call Deck
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
AWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence PillarAWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence Pillar
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
 
Scaled Agile Framework SAFe 4.0
Scaled Agile Framework SAFe 4.0Scaled Agile Framework SAFe 4.0
Scaled Agile Framework SAFe 4.0
 

En vedette

Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overviewRyan Hodgin
 
Intro to Netflix's Chaos Monkey
Intro to Netflix's Chaos MonkeyIntro to Netflix's Chaos Monkey
Intro to Netflix's Chaos MonkeyMichael Whitehead
 
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012Amazon Web Services
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Jeremy Edberg
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at NetflixGareth Bowles
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureLen Bass
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Jay Zarfoss
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixDianne Marsh
 
Antifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyAntifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyWilliam Yang
 
PagerDuty | OSCON 2016 Failure Testing
PagerDuty | OSCON 2016 Failure TestingPagerDuty | OSCON 2016 Failure Testing
PagerDuty | OSCON 2016 Failure TestingPagerDuty
 
Jfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentJfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentAxel Fontaine
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
Saturn 2014. Engineering Velocity: Continuous Delivery at Netflix
Saturn 2014. Engineering Velocity: Continuous Delivery at NetflixSaturn 2014. Engineering Velocity: Continuous Delivery at Netflix
Saturn 2014. Engineering Velocity: Continuous Delivery at NetflixDianne Marsh
 
Chaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
Chaos Testing with F# and Azure by Rachel Reese at Codemotion DubaiChaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
Chaos Testing with F# and Azure by Rachel Reese at Codemotion DubaiCodemotion Dubai
 
Phoenix project
Phoenix project Phoenix project
Phoenix project essbaih
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systemsLen Bass
 
Refactoring for Software Architecture Smells - International Workshop on Refa...
Refactoring for Software Architecture Smells - International Workshop on Refa...Refactoring for Software Architecture Smells - International Workshop on Refa...
Refactoring for Software Architecture Smells - International Workshop on Refa...Ganesh Samarthyam
 
#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security ServicesAlert Logic
 
Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016João Cravo
 

En vedette (20)

Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overview
 
Intro to Netflix's Chaos Monkey
Intro to Netflix's Chaos MonkeyIntro to Netflix's Chaos Monkey
Intro to Netflix's Chaos Monkey
 
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at Netflix
 
Architecture for the cloud deployment case study future
Architecture for the cloud deployment case study futureArchitecture for the cloud deployment case study future
Architecture for the cloud deployment case study future
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at Netflix
 
Antifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyAntifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A Study
 
PagerDuty | OSCON 2016 Failure Testing
PagerDuty | OSCON 2016 Failure TestingPagerDuty | OSCON 2016 Failure Testing
PagerDuty | OSCON 2016 Failure Testing
 
Jfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App DeploymentJfokus 2015 - Immutable Server generation: the new App Deployment
Jfokus 2015 - Immutable Server generation: the new App Deployment
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Saturn 2014. Engineering Velocity: Continuous Delivery at Netflix
Saturn 2014. Engineering Velocity: Continuous Delivery at NetflixSaturn 2014. Engineering Velocity: Continuous Delivery at Netflix
Saturn 2014. Engineering Velocity: Continuous Delivery at Netflix
 
Chaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
Chaos Testing with F# and Azure by Rachel Reese at Codemotion DubaiChaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
Chaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
 
Phoenix project
Phoenix project Phoenix project
Phoenix project
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systems
 
Refactoring for Software Architecture Smells - International Workshop on Refa...
Refactoring for Software Architecture Smells - International Workshop on Refa...Refactoring for Software Architecture Smells - International Workshop on Refa...
Refactoring for Software Architecture Smells - International Workshop on Refa...
 
#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services#ALSummit: Alert Logic & AWS - AWS Security Services
#ALSummit: Alert Logic & AWS - AWS Security Services
 
Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016Continuous Delivery: Playing with Immutable servers @commitporto 2016
Continuous Delivery: Playing with Immutable servers @commitporto 2016
 
presentation-chaos-monkey
presentation-chaos-monkeypresentation-chaos-monkey
presentation-chaos-monkey
 

Similaire à Mini-Training: Netflix Simian Army

Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013aspyker
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers SoftwareDevOps Chicago
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Amazon Web Services
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service FabricDavide Benvegnù
 
AWS Meetup - Nordstrom Data Lab and the AWS Cloud
AWS Meetup - Nordstrom Data Lab and the AWS CloudAWS Meetup - Nordstrom Data Lab and the AWS Cloud
AWS Meetup - Nordstrom Data Lab and the AWS CloudNordstromDataLab
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesAmazon Web Services
 
Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Akash Mahajan
 
Cloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSSCloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSSaspyker
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceChristian Beedgen
 
The Need of Cloud-Native Application
The Need of Cloud-Native ApplicationThe Need of Cloud-Native Application
The Need of Cloud-Native ApplicationEmiliano Pecis
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Amazon Web Services
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanAmazon Web Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 

Similaire à Mini-Training: Netflix Simian Army (20)

Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric
 
AWS Meetup - Nordstrom Data Lab and the AWS Cloud
AWS Meetup - Nordstrom Data Lab and the AWS CloudAWS Meetup - Nordstrom Data Lab and the AWS Cloud
AWS Meetup - Nordstrom Data Lab and the AWS Cloud
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014
 
Cloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSSCloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSS
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
How Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWSHow Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWS
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
 
The Need of Cloud-Native Application
The Need of Cloud-Native ApplicationThe Need of Cloud-Native Application
The Need of Cloud-Native Application
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
Application Delivery Patterns
Application Delivery PatternsApplication Delivery Patterns
Application Delivery Patterns
 

Plus de Betclic Everest Group Tech Team

Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 

Plus de Betclic Everest Group Tech Team (20)

Mini training - Reactive Extensions (Rx)
Mini training - Reactive Extensions (Rx)Mini training - Reactive Extensions (Rx)
Mini training - Reactive Extensions (Rx)
 
Mini training - Moving to xUnit.net
Mini training - Moving to xUnit.netMini training - Moving to xUnit.net
Mini training - Moving to xUnit.net
 
Mini training - Introduction to Microsoft Azure Storage
Mini training - Introduction to Microsoft Azure StorageMini training - Introduction to Microsoft Azure Storage
Mini training - Introduction to Microsoft Azure Storage
 
Akka.Net
Akka.NetAkka.Net
Akka.Net
 
Mini training- Scenario Driven Design
Mini training- Scenario Driven DesignMini training- Scenario Driven Design
Mini training- Scenario Driven Design
 
Email Management in Outlook
Email Management in OutlookEmail Management in Outlook
Email Management in Outlook
 
Mini-Training: SSO with Windows Identity Foundation
Mini-Training: SSO with Windows Identity FoundationMini-Training: SSO with Windows Identity Foundation
Mini-Training: SSO with Windows Identity Foundation
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
Mini-Training: Docker
Mini-Training: DockerMini-Training: Docker
Mini-Training: Docker
 
Mini Training Flyway
Mini Training FlywayMini Training Flyway
Mini Training Flyway
 
Mini-Training: NDepend
Mini-Training: NDependMini-Training: NDepend
Mini-Training: NDepend
 
Management 3.0 Workout
Management 3.0 WorkoutManagement 3.0 Workout
Management 3.0 Workout
 
Lean for Business
Lean for BusinessLean for Business
Lean for Business
 
Short-Training asp.net vNext
Short-Training asp.net vNextShort-Training asp.net vNext
Short-Training asp.net vNext
 
Training – Going Async
Training – Going AsyncTraining – Going Async
Training – Going Async
 
Mini-Training: Mobile UX Trends
Mini-Training: Mobile UX TrendsMini-Training: Mobile UX Trends
Mini-Training: Mobile UX Trends
 
Training: MVVM Pattern
Training: MVVM PatternTraining: MVVM Pattern
Training: MVVM Pattern
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Mini-training: Let’s Git It!
Mini-training: Let’s Git It!Mini-training: Let’s Git It!
Mini-training: Let’s Git It!
 
AngularJS Best Practices
AngularJS Best PracticesAngularJS Best Practices
AngularJS Best Practices
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Mini-Training: Netflix Simian Army

  • 1.
  • 2. Netflix  Founded in 1997  World's leading internet subscription service for enjoying movies and TV programs  More than 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month  Watch it anytime anywhere on all your connected devices  From 2 to 60 billion requests a day to their api in 2 years  12 billion outbound requests to api dependencies  A complex distributed system 2
  • 3. Amazon Web Services  AWS • Officially launched in 2006 • Offers a broad set of global compute, storage, database, analytics, application, and deployment services • Accessible by HTTP via REST or SOAP • Data centers localized in 8 different world regions • Has Nasa, Netflix and the CIA (AWS private replica) as customers  What it provides • Elastic Compute Cloud (EC2), resizable compute capacity in the cloud • Elastic Block Store (EBS), block level storage volumes used by EC2 instances • Elastic Load Balancing, automatic incoming application traffic distribution across multiple EC2 instances 3
  • 4. Architecture picture 4 B C A ASG 2 B C A ASG 1 Availability Zone 2 Region A B C A ASG 2 B C A ASG 1 Availability Zone 1 B C A ASG 2 B C A ASG 1 Availability Zone 3
  • 5. Transition to AWS  Dorothy, you’re not in Kansas anymore • Prepare to unlearn a lot of what you know • Be much more structured about “over the wire” interactions  Co-tenancy is hard • Build your system to expect and accommodate failure at any level  The best way to avoid failure is to fail constantly • Design each distributed system to expect and tolerate failure from other systems on which it depends • Constantly test your ability to succeed despite failure  Learn with real scale, not toy models • Try doing it at full scale with real data • Validate your design choices, with real scale comes trouble  Commit yourself • It is hard to start • Learn from your mistakes 5
  • 6. The Simian Army  Availability and Resiliency as a Service  A set of tools (scheduled agents) that deliberately shuts down services, slows down performances, checks conformity, … And tests the ability to survive them • Chaos Monkey * (Chaos Gorilla and Chaos Kong) • Latency Monkey • Conformity Monkey * • Security Monkey • Doctor Monkey • Janitor Monkey * • Howler Monkey • More to come, … 6
  • 7. The Chaos Monkey  How • Service running on AWS and seeking out Auto Scaling groups and terminating instances per group • Flexible enough design to work on other cloud providers or instances grouping an can be enhanced to support that • Has a configurable schedule, running by default on non-holiday weekdays between 9am and 3pm • Gorilla monkey simulates the outage of an entire Availability Zone • Kong Monkey simulates the outage of an entire Region  Why • Prepare to fail to ensure you can tolerate instance failure • Learn from new unpredicted issues that may occur • Check services automatic rebalance without user visual impact 7
  • 8. The Latency Monkey  How • Service running on AWS and inducing artificial delay on the RESTful communication layer and measuring upstream services response • With large delays, a node or even an entire service downtime can be simulated without physically bringing the instances down  Why • Simulate service degradation • Test that services respond appropriately • Test the ability to survive an entire service downtime • Test the fault tolerance of a new service by simulating the failure of its dependencies without affecting the rest of the system 8
  • 9. The Conformity Monkey  How • Service running on AWS and finding instances that don’t comply to predefined best practices • Marks non compliant instances, shuts them down and notifies corresponding owners • Check is performed every hour by default • Notification is sent only once per day at noon time  Why • Non compliant instances, like not belonging to and Auto Scaling Group are trouble waiting to happen • Anticipate and give the owners a chance to relaunch them properly 9
  • 10. The Security Monkey  How • Service running on AWS and an extension of the Conformity Monkey • Finds security violations or vulnerabilities  Why • Track improperly configured security groups to terminate offending instances • Ensure that all their SSL and DRM certificates are valid and are not coming up for renewal 10
  • 11. The Doctor Monkey  How • Service running on AWS and taping into health checks running on each instance • Monitors other external health signs like cpu load or memory usage • Remove unhealthy detected instances from service  Why • Give time to service owners to root cause the problem • Eventually terminate the detected instances 11
  • 12. The Janitor Monkey  How (mark, notify, delete) • Service running on AWS and searching for unused resources • Mark resources as cleanup candidates • Schedule resources disposal time • Cleanup deadline is defined in the rule that allows to mark the resource • Notify the owners of the marked resources • Notification time is 2 business days before the cleanup deadline by default • During this period the owner can decide to cleanup or retain the resource • Dispose resources once deadline is met  Why • Ensure that the cloud environment is running free of clutter and waste. • Save costs on operations (The more and the longer you use, the more you pay) • Free up engineering time, no need to manage unused resources anymore 12
  • 13. The Howler Monkey  How • Service running on AWS and monitoring whether a workload meets AWS possible limitations and reports it  Why • Maintain healthy operations by ensuring that AWS limitations are respected • Save costs on operations 13
  • 14. Netflix Open Source Software  Build your own robust and highly available platform  Release PaaS components git by git to incentive others • Sources at github.com/netflix • Intros and techniques at http://techblog.netflix.com • Blog post or new code every few weeks  Motivations • Give back to Apache licensed OSS community • Motivate, retain, hire top engineers • "Peer pressure" code cleanup, external contributions  Users and contributors • IBM • Waze • Yahoo • Eucalyptus (Scalable cloud software) • Yammer (Private social network), … 14
  • 15. References  http://ir.netflix.com/  http://aws.amazon.com/  https://github.com/Netflix/SimianArmy/wiki  http://www.zdnet.com/netflix-how-we-got-a-grip-on-awss-cloud- 3040095277/  http://techblog.netflix.com/2012/07/chaos-monkey-released-into- wild.html  http://gigaom.com/2013/07/21/ibm-high-fives-netflix-open-source- tools/  http://sssslide.com/www.slideshare.net/adrianco/netflix-and-open- source 15
  • 16. Find out more • On https://techblog.betclicgroup.com/
  • 17. We want our Sports betting, Poker, Horse racing and Casino & Games brands to be easy to use for every gamer around the world. Code with us to make that happen. Look at all the challenges we offer HERE We are hiring ! Check our Employer Page Follow us on LinkedIn
  • 18. About Us • Betclic Everest Group, one of the world leaders in online gaming, has a unique portfolio comprising various complementary international brands: Betclic, Everest, Bet-at- home.com, Expekt, Monte-Carlo Casino… • Through our brands, Betclic Everest Group places expertise, technological know-how and security at the heart of our strategy to deliver an on-line gaming offer attuned to the passion of our players. We want our brands to be easy to use for every gamer around the world. We’re building our company to make that happen. • Active in 100 countries with more than 12 million customers worldwide, the Group is committed to promoting secure and responsible gaming and is a member of several international professional associations including the EGBA (European Gaming and Betting Association) and the ESSA (European Sports Security Association).