SlideShare une entreprise Scribd logo
1  sur  62
ONE MAN OPS
      Reliability & Scale in AWS while letting you sleep through the night
                                                         Jos Boumans - @jiboumans
http://www.fwallpaper.net/picture_pics-Sleepy-cat.html
ONE OF A KIND
   My own category
RIPE NCC
Engineering manager for RIPE Database
                                        http://www.ripe.net/db
CANONICAL
                    Engineering manager for Ubuntu Server 10.04 & 10.10

http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775          http://www.ubuntu.com/business/server/overview
KRUX
VP of Operations & Infrastructure

                                    http://www.krux.com/
GOOD GUYS OF DATA PRIVACY
LOTS OF TRAFFIC
http://www.americapictures.net/buenos-aires-traffic-city-night-argentina.html
0                              2,500                 5,000        7,500   10,000



               AVERAGE REQUESTS* / SEC
                                                              *Twitter: New tweets
                                                              Wikipedia: Articles read
https://twitter.com/tps_watcher
                                                              Krux: New data points
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
0                            125,000,000                            250,000,000   375,000,000   500,000,000




                   MONTHLY UNIQUE USERS
http://www.mediabistro.com/alltwitter/twitter-active-total-users_b17655
http://technorati.com/technology/article/wikipedias-nonprofit-parent-raises-20-million/
WE CHOSE 'THE CLOUD'
http://previewnetworks.com/blog/
THERE ARE DOWNSIDES
http://modernsavage.hubpages.com/hub/10-springfield-shopper-headlines
FOCUS ON AWS
               http://aws.amazon.com/
APRIL 21, 2011
                                                                                                                    http://aws.amazon.com/message/65648/
http://businessnerds.wordpress.com/2011/05/28/so-far-so-good…-the-review/   http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
... SOME OUTAGES ...
... SKIPPED FOR BREVITY ...
JUNE 14, 2012
http://www.laczik.org/BMW/repair/E38_wiring_harness/E38_wiring_harness.html   http://blog.pagerduty.com/2012/06/outage-post-mortem-june-14/
JUNE 29, 2012
http://www.fanpop.com/spots/thunderstorm/images/25416163/title/thunderstorms-wallpaper   http://aws.amazon.com/message/67457/
AWS OUTAGE = YOUR OUTAGE
http://it.mario.wikia.com/wiki/Lakitu
THE RULES HAVE CHANGED
                                                        You're not in Kansas anymore

http://entreatmenot.blogspot.com/2011/04/shattered-dreams.html
NETWORK WILL PARTITION
                                                              And it will happen often

http://thevinylvillain.blogspot.com/2010_04_01_archive.html
DISK IO WILL FLUCTUATE
                                                     On a good day, it's mediocre

http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
IP ADDRESSES WILL CHANGE
                     IP lease is 8 hours
                    DNS TTL is 60 seconds
www.fantom-xp.com
INSTANCES WILL DIE
                                  And it will always be your Database Master

http://room57.deviantart.com/art/Hangman-188353196
HUMANS MAKE MISTAKES
     Including your humans
EMBRACE FAILURE
                                Hardware will fail. Humans will make errors.
                                   Nature will produce thunderstorms.
http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
ADJUST YOUR STRATEGY
                                                      Don't bring a knife to a gun fight

http://www.flickr.com/photos/statlerhotel/6628770499/sizes/l/in/photostream/
DATA STORES
                                                     Some work better than others

http://gustavhoiland.com/2010/03/10/stacked-boxes/
RDBMS
  CouchDB
                                                   BigTable Based
Dynamo Based
                                                 Master / Slave based




               CAP THEOREM
       Your choice: sacrifice availability or consistency.
                       Orange is a lie.
MYSQL / ORACLE VS RDS
  See: Network partitioning & instances dying
BIGTABLE BASED STORES
            HBase, Accumulo, Hypertable
 Still suffer when network partitioning happens
                                                  http://www.cloudera.com/cdh4/
DYNAMO BASED STORES
                                                         Cassandra, Riak, DynamoDB

http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html   http://aws.amazon.com/dynamodb/faqs/
GO HOSTED?
                                 CouchDB, MongoDB, Riak, Cassandra, HBase
                                          Your Latency May Vary
http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html
CLIENT SIDE STORAGE
                                          Keep a copy of your users data locally

http://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/       http://www.w3.org/2001/tag/2010/09/ClientSideStorage.html
FILE STORES
                                                                   EBS vs Instance Store

http://homedezine.blogspot.com/2011/04/day-my-cat-removed-carpet-photo-studio.html
SIMPLE STORAGE SERVICE
                                                        S3: Arguably AWS' best feature

http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/
TRAFFIC SHAPING
                                                Control every part of the request

http://www.visualphotos.com/image/2x4154765/man_standing_with_traffic_cones_in_shape_of_u-turn
STAY LOCAL IF YOU CAN
                 Going off box exposes you to risks you need to mitigate

http://southshorewoman.com/issue/june-2010/article/local-character
CACHE WHAT YOU CAN
                                  HTTP Responses, DB Queries, User content
                                         Browsers have caches too!
http://theoatmeal.com/blog/charity_money
USE ELASTIC LOAD BALANCERS
                                                They will save you more than once

http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/
USE GLOBAL LOAD BALANCING
  Fail over to the closest data center on region failure
SHOUT OUT: DYN
DNS for Bit.ly, Quora, Twitter, Wikia, etc
USE A CDN
                                        Critical items should always be available

http://kadanthuponanimidangal.blogspot.com/2010/12/blog-post_6992.html
MEASURE EVERYTHING
                Find outliers, deviants & trends before they cause trouble

http://www.themoviedb.org/movie/629-the-usual-suspects
GRAPHITE, STATSD & COLLECTD
                       Use Statsd & Collectd for application/system metrics
                           Use graphite to store, aggregate & visualize
                                                                                                                    http://hostedgraphite.com/
http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html   http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/
GRAPH EVENTS
         Deployments, outages, CDN reconfigurations, failed builds, etc
          Anything that's important to the health of your eco system
http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
COMPARE WEEK TO WEEK
                          Overlay week to week graphs using timeShift()
                         Quickly identifies trends and deviations from trends
http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-10
FORECASTING
                                 Use Holt-Winters confidence bands
                        Verify that your metrics are within normal tolerance
https://github.com/ripienaar/graphite-graph-dsl/wiki/Creating-Holt-Winters-Forecasts
FIND INDIVIDUAL OUTLIERS
                                                      Absolute numbers mean very little
                                                       Use mean & standard deviation
http://en.wikipedia.org/wiki/File:Black_sheep-1.jpg
ALERT ON TRENDS
                                Once you go over a threshold, it's too late
                              Alert on unwanted trends and preemptively fix
http://sub-second.blogspot.com/2012/06/reporting-response-times-percentile.html   http://aphyr.github.com/riemann/
MEASURE WITHOUT RETROFIT
                                          LogFormat "http.beacon:%D|ms" stats
                                         CustomLog "|nc -u localhost 8125" stats
http://absinthemindedhero.blogspot.com/2012/03/victory-nonetheless.html   http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/
SHOUT OUT: NEW RELIC
         Python, Ruby, .NET, Java, PHP support
In depth profiling of your app for performance & errors.
CONFIGURATION MANAGEMENT
                                                             Unique snowflakes are bad

http://www.torange.us/Plants/Conifers/spruce-needles-in-hoarfrost-424.html
PUPPET VS CHEF
      Yes.

                         http://puppetlabs.com/
                 http://www.opscode.com/chef
INFRASTRUCTURE AS CODE
                                            Use different environments
                                            Measure and report on it
http://americansingercanary.com/green.htm
SHOUT OUT: UBUNTU
                                      Ubuntu + cloud-init + boto = awesome*
                                                                         *I am biased

http://www.123rf.com/photo_4871141_food-pyramid-isolated-on-white.html                  https://github.com/krux/ops-tools
DEV = PRODUCTION
                          "I dunno, it worked on my laptop"
                                 Instead, use vagrant
http://vagrantup.com/                                         http://vagrantup.com/
ROLL YOUR OWN AMIS
                                                Instantly boot up new deployments
                                                     Reduce Time to Respond
http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html   http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
CONFIDENT DEPLOYS
                                                   That human error could be yours

http://www.etsy.com/listing/37178125/stormtrooper-regrets-those-were-the
CONTINUOUS INTEGRATION
      Ours: Github + Jenkins + FPM + apt::s3
   From commit to deployable in one command                         http://github.com/
                                                                 http://jenkins-ci.org/
                                                   https://github.com/thekad/apt-s3
                                          https://github.com/jordansissel/fpm/wiki/
ONE CLICK DEPLOYMENTS
                                        Deployments should not be exciting.
                                      Don't create a checklist; automate & track
http://www.thegreenhead.com/2012/07/one-click-butter-cutter.php                    https://checkmarkable.com/
DARK LAUNCHES
               Exercise the code without impacting the user experience
                                                                          http://www.kissmetrics.com/
http://www.layoutsparks.com/pictures/moon-23                   https://github.com/yahoo/boomerang/
SHADOW TRAFFIC
                                                    Test new code against live traffic

http://doppelthingers.tumblr.com/post/12839979386/traffic-light-shadow-hangman-and-possibly-his   https://gist.github.com/3125323
SLEEP TIGHT
                                           Slides at: www.Slideshare.net/jiboumans
                                                 We're hiring: www.krux.com
http://raafay-awan.blogspot.com/2011/08/cats-cutest-of-creatures.html

Contenu connexe

Tendances

DevOps or: How I Learned to Stop Worrying and Love the Cloud
DevOps or: How I Learned to Stop Worrying and Love the CloudDevOps or: How I Learned to Stop Worrying and Love the Cloud
DevOps or: How I Learned to Stop Worrying and Love the CloudHirokazu MORIKAWA
 
High Performance Web Components
High Performance Web ComponentsHigh Performance Web Components
High Performance Web ComponentsSteve Souders
 
SeleniumCamp 2014 - Webdriver – the final frontier
SeleniumCamp 2014 - Webdriver – the final frontierSeleniumCamp 2014 - Webdriver – the final frontier
SeleniumCamp 2014 - Webdriver – the final frontierbkobos
 
@media - Even Faster Web Sites
@media - Even Faster Web Sites@media - Even Faster Web Sites
@media - Even Faster Web SitesSteve Souders
 
Web 2.0 Expo: Even Faster Web Sites
Web 2.0 Expo: Even Faster Web SitesWeb 2.0 Expo: Even Faster Web Sites
Web 2.0 Expo: Even Faster Web SitesSteve Souders
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web DesignChristopher Schmitt
 
[wvbcn] Adaptive Images in Responsive Web Design
[wvbcn] Adaptive Images in Responsive Web Design[wvbcn] Adaptive Images in Responsive Web Design
[wvbcn] Adaptive Images in Responsive Web DesignChristopher Schmitt
 
Web Directions South - Even Faster Web Sites
Web Directions South - Even Faster Web SitesWeb Directions South - Even Faster Web Sites
Web Directions South - Even Faster Web SitesSteve Souders
 
Grady sean slide_sharepresentation
Grady sean slide_sharepresentationGrady sean slide_sharepresentation
Grady sean slide_sharepresentationcosmatic1975
 
WepApps mit Play! - Nichts leichter als das
WepApps mit Play! - Nichts leichter als dasWepApps mit Play! - Nichts leichter als das
WepApps mit Play! - Nichts leichter als dasAndreas Koop
 
[drupalcampatx] Adaptive Images in Responsive Web Design
[drupalcampatx] Adaptive Images in Responsive Web Design[drupalcampatx] Adaptive Images in Responsive Web Design
[drupalcampatx] Adaptive Images in Responsive Web DesignChristopher Schmitt
 
Preconnect, prefetch, prerender...
Preconnect, prefetch, prerender...Preconnect, prefetch, prerender...
Preconnect, prefetch, prerender...MilanAryal
 
Continuous Deployment for Atlassian Plugins - AtlasCamp 2011
Continuous Deployment for Atlassian Plugins - AtlasCamp 2011Continuous Deployment for Atlassian Plugins - AtlasCamp 2011
Continuous Deployment for Atlassian Plugins - AtlasCamp 2011Atlassian
 
Souders WPO Web2.0Expo
Souders WPO Web2.0ExpoSouders WPO Web2.0Expo
Souders WPO Web2.0Expoguest0b3d92d
 
Open source technologies in Microsoft cloud
Open source technologies in Microsoft cloudOpen source technologies in Microsoft cloud
Open source technologies in Microsoft cloudAlexey Bokov
 
Don't Just do Agile - AgileDC Conference
Don't Just do Agile - AgileDC ConferenceDon't Just do Agile - AgileDC Conference
Don't Just do Agile - AgileDC ConferenceSimon Storm
 
State of the resource timing api
State of the resource timing apiState of the resource timing api
State of the resource timing apiAaron Peters
 

Tendances (18)

DevOps or: How I Learned to Stop Worrying and Love the Cloud
DevOps or: How I Learned to Stop Worrying and Love the CloudDevOps or: How I Learned to Stop Worrying and Love the Cloud
DevOps or: How I Learned to Stop Worrying and Love the Cloud
 
High Performance Web Components
High Performance Web ComponentsHigh Performance Web Components
High Performance Web Components
 
SeleniumCamp 2014 - Webdriver – the final frontier
SeleniumCamp 2014 - Webdriver – the final frontierSeleniumCamp 2014 - Webdriver – the final frontier
SeleniumCamp 2014 - Webdriver – the final frontier
 
@media - Even Faster Web Sites
@media - Even Faster Web Sites@media - Even Faster Web Sites
@media - Even Faster Web Sites
 
Web 2.0 Expo: Even Faster Web Sites
Web 2.0 Expo: Even Faster Web SitesWeb 2.0 Expo: Even Faster Web Sites
Web 2.0 Expo: Even Faster Web Sites
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design
 
[wvbcn] Adaptive Images in Responsive Web Design
[wvbcn] Adaptive Images in Responsive Web Design[wvbcn] Adaptive Images in Responsive Web Design
[wvbcn] Adaptive Images in Responsive Web Design
 
Web Directions South - Even Faster Web Sites
Web Directions South - Even Faster Web SitesWeb Directions South - Even Faster Web Sites
Web Directions South - Even Faster Web Sites
 
Grady sean slide_sharepresentation
Grady sean slide_sharepresentationGrady sean slide_sharepresentation
Grady sean slide_sharepresentation
 
WepApps mit Play! - Nichts leichter als das
WepApps mit Play! - Nichts leichter als dasWepApps mit Play! - Nichts leichter als das
WepApps mit Play! - Nichts leichter als das
 
[drupalcampatx] Adaptive Images in Responsive Web Design
[drupalcampatx] Adaptive Images in Responsive Web Design[drupalcampatx] Adaptive Images in Responsive Web Design
[drupalcampatx] Adaptive Images in Responsive Web Design
 
Preconnect, prefetch, prerender...
Preconnect, prefetch, prerender...Preconnect, prefetch, prerender...
Preconnect, prefetch, prerender...
 
Continuous Deployment for Atlassian Plugins - AtlasCamp 2011
Continuous Deployment for Atlassian Plugins - AtlasCamp 2011Continuous Deployment for Atlassian Plugins - AtlasCamp 2011
Continuous Deployment for Atlassian Plugins - AtlasCamp 2011
 
Souders WPO Web2.0Expo
Souders WPO Web2.0ExpoSouders WPO Web2.0Expo
Souders WPO Web2.0Expo
 
Open source technologies in Microsoft cloud
Open source technologies in Microsoft cloudOpen source technologies in Microsoft cloud
Open source technologies in Microsoft cloud
 
Don't Just do Agile - AgileDC Conference
Don't Just do Agile - AgileDC ConferenceDon't Just do Agile - AgileDC Conference
Don't Just do Agile - AgileDC Conference
 
State of the resource timing api
State of the resource timing apiState of the resource timing api
State of the resource timing api
 
do u webview?
do u webview?do u webview?
do u webview?
 

Similaire à Reliability & Scale in AWS while letting you sleep through the night

Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013Jon Arne Sæterås
 
Chaos Patterns
Chaos PatternsChaos Patterns
Chaos PatternsBruce Wong
 
Creating an Effective Mobile API
Creating an Effective Mobile API Creating an Effective Mobile API
Creating an Effective Mobile API Nick DeNardis
 
Abusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and ProfitAbusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and ProfitAlan Pinstein
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...Jos Boumans
 
Makingweb: Great front end performance starts on the server.
Makingweb: Great front end performance starts on the server.Makingweb: Great front end performance starts on the server.
Makingweb: Great front end performance starts on the server.Jon Arne Sæterås
 
Nosql-columbia-feb2011
Nosql-columbia-feb2011Nosql-columbia-feb2011
Nosql-columbia-feb2011siculars
 
Fast Slim Correct: The History and Evolution of JavaScript.
Fast Slim Correct: The History and Evolution of JavaScript.Fast Slim Correct: The History and Evolution of JavaScript.
Fast Slim Correct: The History and Evolution of JavaScript.John Dalziel
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
 
Emerging storage-trends-for-containers
Emerging storage-trends-for-containersEmerging storage-trends-for-containers
Emerging storage-trends-for-containerskiran mova
 
Mobile Web Speed Bumps
Mobile Web Speed BumpsMobile Web Speed Bumps
Mobile Web Speed BumpsNicholas Zakas
 
A walk down NOSQL Lane in the cloud
A walk down NOSQL Lane in the cloudA walk down NOSQL Lane in the cloud
A walk down NOSQL Lane in the cloudsiculars
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web DesignDave Olsen
 
Fowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing WorkshopFowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing WorkshopMark Masterson
 
RESS: An Evolution of Responsive Web Design
RESS: An Evolution of Responsive Web DesignRESS: An Evolution of Responsive Web Design
RESS: An Evolution of Responsive Web DesignDave Olsen
 
New recipes for the ever growing content cloud
New recipes for the ever growing content cloudNew recipes for the ever growing content cloud
New recipes for the ever growing content cloudCédric Hüsler
 
REST for .NET - Introduction to ASP.NET Web API
REST for .NET - Introduction to ASP.NET Web APIREST for .NET - Introduction to ASP.NET Web API
REST for .NET - Introduction to ASP.NET Web APITomas Jansson
 
Automating Oracle Database deployment with Amazon Web Services, fabric, and boto
Automating Oracle Database deployment with Amazon Web Services, fabric, and botoAutomating Oracle Database deployment with Amazon Web Services, fabric, and boto
Automating Oracle Database deployment with Amazon Web Services, fabric, and botomjbommar
 

Similaire à Reliability & Scale in AWS while letting you sleep through the night (20)

Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013
 
Chaos Patterns
Chaos PatternsChaos Patterns
Chaos Patterns
 
Creating an Effective Mobile API
Creating an Effective Mobile API Creating an Effective Mobile API
Creating an Effective Mobile API
 
Abusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and ProfitAbusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and Profit
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
 
Makingweb: Great front end performance starts on the server.
Makingweb: Great front end performance starts on the server.Makingweb: Great front end performance starts on the server.
Makingweb: Great front end performance starts on the server.
 
Nosql-columbia-feb2011
Nosql-columbia-feb2011Nosql-columbia-feb2011
Nosql-columbia-feb2011
 
Fast Slim Correct: The History and Evolution of JavaScript.
Fast Slim Correct: The History and Evolution of JavaScript.Fast Slim Correct: The History and Evolution of JavaScript.
Fast Slim Correct: The History and Evolution of JavaScript.
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
Emerging storage-trends-for-containers
Emerging storage-trends-for-containersEmerging storage-trends-for-containers
Emerging storage-trends-for-containers
 
Mobile Web Speed Bumps
Mobile Web Speed BumpsMobile Web Speed Bumps
Mobile Web Speed Bumps
 
A walk down NOSQL Lane in the cloud
A walk down NOSQL Lane in the cloudA walk down NOSQL Lane in the cloud
A walk down NOSQL Lane in the cloud
 
Lightning talks
Lightning talksLightning talks
Lightning talks
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web Design
 
Fowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing WorkshopFowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing Workshop
 
RESS: An Evolution of Responsive Web Design
RESS: An Evolution of Responsive Web DesignRESS: An Evolution of Responsive Web Design
RESS: An Evolution of Responsive Web Design
 
SQL Server On SANs
SQL Server On SANsSQL Server On SANs
SQL Server On SANs
 
New recipes for the ever growing content cloud
New recipes for the ever growing content cloudNew recipes for the ever growing content cloud
New recipes for the ever growing content cloud
 
REST for .NET - Introduction to ASP.NET Web API
REST for .NET - Introduction to ASP.NET Web APIREST for .NET - Introduction to ASP.NET Web API
REST for .NET - Introduction to ASP.NET Web API
 
Automating Oracle Database deployment with Amazon Web Services, fabric, and boto
Automating Oracle Database deployment with Amazon Web Services, fabric, and botoAutomating Oracle Database deployment with Amazon Web Services, fabric, and boto
Automating Oracle Database deployment with Amazon Web Services, fabric, and boto
 

Dernier

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Dernier (20)

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

Reliability & Scale in AWS while letting you sleep through the night

  • 1. ONE MAN OPS Reliability & Scale in AWS while letting you sleep through the night Jos Boumans - @jiboumans http://www.fwallpaper.net/picture_pics-Sleepy-cat.html
  • 2. ONE OF A KIND My own category
  • 3. RIPE NCC Engineering manager for RIPE Database http://www.ripe.net/db
  • 4. CANONICAL Engineering manager for Ubuntu Server 10.04 & 10.10 http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775 http://www.ubuntu.com/business/server/overview
  • 5. KRUX VP of Operations & Infrastructure http://www.krux.com/
  • 6. GOOD GUYS OF DATA PRIVACY
  • 8. 0 2,500 5,000 7,500 10,000 AVERAGE REQUESTS* / SEC *Twitter: New tweets Wikipedia: Articles read https://twitter.com/tps_watcher Krux: New data points http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
  • 9. 0 125,000,000 250,000,000 375,000,000 500,000,000 MONTHLY UNIQUE USERS http://www.mediabistro.com/alltwitter/twitter-active-total-users_b17655 http://technorati.com/technology/article/wikipedias-nonprofit-parent-raises-20-million/
  • 10. WE CHOSE 'THE CLOUD' http://previewnetworks.com/blog/
  • 12. FOCUS ON AWS http://aws.amazon.com/
  • 13. APRIL 21, 2011 http://aws.amazon.com/message/65648/ http://businessnerds.wordpress.com/2011/05/28/so-far-so-good…-the-review/ http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
  • 14. ... SOME OUTAGES ... ... SKIPPED FOR BREVITY ...
  • 15. JUNE 14, 2012 http://www.laczik.org/BMW/repair/E38_wiring_harness/E38_wiring_harness.html http://blog.pagerduty.com/2012/06/outage-post-mortem-june-14/
  • 17. AWS OUTAGE = YOUR OUTAGE http://it.mario.wikia.com/wiki/Lakitu
  • 18. THE RULES HAVE CHANGED You're not in Kansas anymore http://entreatmenot.blogspot.com/2011/04/shattered-dreams.html
  • 19. NETWORK WILL PARTITION And it will happen often http://thevinylvillain.blogspot.com/2010_04_01_archive.html
  • 20. DISK IO WILL FLUCTUATE On a good day, it's mediocre http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
  • 21. IP ADDRESSES WILL CHANGE IP lease is 8 hours DNS TTL is 60 seconds www.fantom-xp.com
  • 22. INSTANCES WILL DIE And it will always be your Database Master http://room57.deviantart.com/art/Hangman-188353196
  • 23. HUMANS MAKE MISTAKES Including your humans
  • 24. EMBRACE FAILURE Hardware will fail. Humans will make errors. Nature will produce thunderstorms. http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
  • 25. ADJUST YOUR STRATEGY Don't bring a knife to a gun fight http://www.flickr.com/photos/statlerhotel/6628770499/sizes/l/in/photostream/
  • 26. DATA STORES Some work better than others http://gustavhoiland.com/2010/03/10/stacked-boxes/
  • 27. RDBMS CouchDB BigTable Based Dynamo Based Master / Slave based CAP THEOREM Your choice: sacrifice availability or consistency. Orange is a lie.
  • 28. MYSQL / ORACLE VS RDS See: Network partitioning & instances dying
  • 29. BIGTABLE BASED STORES HBase, Accumulo, Hypertable Still suffer when network partitioning happens http://www.cloudera.com/cdh4/
  • 30. DYNAMO BASED STORES Cassandra, Riak, DynamoDB http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html http://aws.amazon.com/dynamodb/faqs/
  • 31. GO HOSTED? CouchDB, MongoDB, Riak, Cassandra, HBase Your Latency May Vary http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html
  • 32. CLIENT SIDE STORAGE Keep a copy of your users data locally http://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/ http://www.w3.org/2001/tag/2010/09/ClientSideStorage.html
  • 33. FILE STORES EBS vs Instance Store http://homedezine.blogspot.com/2011/04/day-my-cat-removed-carpet-photo-studio.html
  • 34. SIMPLE STORAGE SERVICE S3: Arguably AWS' best feature http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/
  • 35. TRAFFIC SHAPING Control every part of the request http://www.visualphotos.com/image/2x4154765/man_standing_with_traffic_cones_in_shape_of_u-turn
  • 36. STAY LOCAL IF YOU CAN Going off box exposes you to risks you need to mitigate http://southshorewoman.com/issue/june-2010/article/local-character
  • 37. CACHE WHAT YOU CAN HTTP Responses, DB Queries, User content Browsers have caches too! http://theoatmeal.com/blog/charity_money
  • 38. USE ELASTIC LOAD BALANCERS They will save you more than once http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/
  • 39. USE GLOBAL LOAD BALANCING Fail over to the closest data center on region failure
  • 40. SHOUT OUT: DYN DNS for Bit.ly, Quora, Twitter, Wikia, etc
  • 41. USE A CDN Critical items should always be available http://kadanthuponanimidangal.blogspot.com/2010/12/blog-post_6992.html
  • 42. MEASURE EVERYTHING Find outliers, deviants & trends before they cause trouble http://www.themoviedb.org/movie/629-the-usual-suspects
  • 43. GRAPHITE, STATSD & COLLECTD Use Statsd & Collectd for application/system metrics Use graphite to store, aggregate & visualize http://hostedgraphite.com/ http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/
  • 44. GRAPH EVENTS Deployments, outages, CDN reconfigurations, failed builds, etc Anything that's important to the health of your eco system http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
  • 45. COMPARE WEEK TO WEEK Overlay week to week graphs using timeShift() Quickly identifies trends and deviations from trends http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-10
  • 46. FORECASTING Use Holt-Winters confidence bands Verify that your metrics are within normal tolerance https://github.com/ripienaar/graphite-graph-dsl/wiki/Creating-Holt-Winters-Forecasts
  • 47. FIND INDIVIDUAL OUTLIERS Absolute numbers mean very little Use mean & standard deviation http://en.wikipedia.org/wiki/File:Black_sheep-1.jpg
  • 48. ALERT ON TRENDS Once you go over a threshold, it's too late Alert on unwanted trends and preemptively fix http://sub-second.blogspot.com/2012/06/reporting-response-times-percentile.html http://aphyr.github.com/riemann/
  • 49. MEASURE WITHOUT RETROFIT LogFormat "http.beacon:%D|ms" stats CustomLog "|nc -u localhost 8125" stats http://absinthemindedhero.blogspot.com/2012/03/victory-nonetheless.html http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/
  • 50. SHOUT OUT: NEW RELIC Python, Ruby, .NET, Java, PHP support In depth profiling of your app for performance & errors.
  • 51. CONFIGURATION MANAGEMENT Unique snowflakes are bad http://www.torange.us/Plants/Conifers/spruce-needles-in-hoarfrost-424.html
  • 52. PUPPET VS CHEF Yes. http://puppetlabs.com/ http://www.opscode.com/chef
  • 53. INFRASTRUCTURE AS CODE Use different environments Measure and report on it http://americansingercanary.com/green.htm
  • 54. SHOUT OUT: UBUNTU Ubuntu + cloud-init + boto = awesome* *I am biased http://www.123rf.com/photo_4871141_food-pyramid-isolated-on-white.html https://github.com/krux/ops-tools
  • 55. DEV = PRODUCTION "I dunno, it worked on my laptop" Instead, use vagrant http://vagrantup.com/ http://vagrantup.com/
  • 56. ROLL YOUR OWN AMIS Instantly boot up new deployments Reduce Time to Respond http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
  • 57. CONFIDENT DEPLOYS That human error could be yours http://www.etsy.com/listing/37178125/stormtrooper-regrets-those-were-the
  • 58. CONTINUOUS INTEGRATION Ours: Github + Jenkins + FPM + apt::s3 From commit to deployable in one command http://github.com/ http://jenkins-ci.org/ https://github.com/thekad/apt-s3 https://github.com/jordansissel/fpm/wiki/
  • 59. ONE CLICK DEPLOYMENTS Deployments should not be exciting. Don't create a checklist; automate & track http://www.thegreenhead.com/2012/07/one-click-butter-cutter.php https://checkmarkable.com/
  • 60. DARK LAUNCHES Exercise the code without impacting the user experience http://www.kissmetrics.com/ http://www.layoutsparks.com/pictures/moon-23 https://github.com/yahoo/boomerang/
  • 61. SHADOW TRAFFIC Test new code against live traffic http://doppelthingers.tumblr.com/post/12839979386/traffic-light-shadow-hangman-and-possibly-his https://gist.github.com/3125323
  • 62. SLEEP TIGHT Slides at: www.Slideshare.net/jiboumans We're hiring: www.krux.com http://raafay-awan.blogspot.com/2011/08/cats-cutest-of-creatures.html