Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Aws uk ug #8 not everything that happens in vegas stay in vegas

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 18 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Similaire à Aws uk ug #8 not everything that happens in vegas stay in vegas (20)

Aws uk ug #8 not everything that happens in vegas stay in vegas

  1. 1. Not everything that happens in Vegas stays in Vegas
  2. 2. DevOps or “getting devs to be on call for what they ship” :-)
  3. 3. Netflix development Priorities 1. Speed of innovation 2. Availability 3. Running costs a. “It’ll cost what it ends up costing” In practise, they found that holding to the first two ended up costing way less than otherwise expected.
  4. 4. Riot Games + League of Legends Cloud == ideal for MMOs. Solve launch issues. ● chef gets used a lot here. ○ talked about their evolution with it, lessons learned ● What sucked? ○ 25 minute bootstrap runs ○ External dependencies (including S3) ○ Duplicating application deployment recipes ● golden masters and immutable servers simplify your life drastically. ● “if you’re doing chef without BerkShelf you’re doing it wrong” ● Make it easy to throw up new things
  5. 5. Testing in production Netflix, Riot, Kickstarter - they all do this. At scale. Netflix ● 10s to 100s of code pushes per day ● 1000s to 100,000s of config changes per day ○ they tune their A/B testing constantly Of course, they also have the instrumentation to react to this.
  6. 6. How’re other people doing DevOps? Good news - we’re at the “more sophisticated” end of the spectrum. Every “cloud native” was doing this. Things other people did better: ● “Golden master” AMIs ● Immutable instances ● Absolute ownership of vertical slices ● Config-managment (chef/puppet) featured prominently ● Extensive monitoring+logs+visibility == “table stakes” ○ for developers! ● Easy to throw up new things ● Run many small, simple, collaborating things Who? Riot Games, Netflix, change.org, Kickstarter
  7. 7. Logging aggregation is important
  8. 8. Logging aggregation is important Lots of 3rd party companies are offering centralized logging services, there's a huge appetite for logging and monitoring. ● http://logentries.com/ ● http://www.loggly.com/ ● http://papertrailapp.com/ ● https://www.splunkstorm.com/tour ● http://www.datadoghq.com/ ● DIY - Lumberjacking slides
  9. 9. DEMO: Monitoring & Logging https://app.datadoghq.com/infrastructure ● Tag Metrics, awesome Metric discoverability ● Cloud Watch integration ○ I never knew I could see ELB metrics :-) ● Alarms are integrated ● You can template Dashboards https://papertrailapp.com/ ● Can Search, Save Searches, Alerts on searches ● No alert on patterns ● Archive to S3 / Push to Redshift Logging aggregation is FOR DEVELOPERS!!! Saves lots of time when you’re on call.
  10. 10. Loggly Session Benefit of logging as a service. ● When your infrastructure is in trouble, you do not want to have your logging analytic system on the same infrastructure. AWS Services that loggly could use: ● Kafka + Storm vs Kinesis ● Elastic Search vs Cloud Search Predictive Analytics using Storm, Hadoop, R and AWS http://www.youtube.com/watch?v=6Sl3eBmDheE
  11. 11. Loggly Session ● Provisioned IOPS solve all issues :) ● ELB do not perform with extremely high volume of requests. ● DNS round robin is a very good basic load balancing solution ● Cassandra works very well for application data. ● Cassandra does not work well as a queue system, hard to track order of events. ● Keep the architecture simple.
  12. 12. Large Scale Load Testing on AWS
  13. 13. Many types of load ● Load testing ○ (running a marathon), predict future load and plan in advance ● Stress testing ○ Break things (figure out limits), mitigation plans ● Resilience test ○ Figure out how many parts of the architecture you can lose and still operate ● Performance test ○ How is latency and throughput changing when the load increase
  14. 14. Phase roll out and measure ● Load Testing is necessary but not sufficient. ○ Deploy to alpha cluster. ○ The release cycle is important, phased deployment, one box, monitor and ramp up. ○ Monitor performance and behaviour, look at 99% of the traffic, not at the average. ● Netflix record 1.2 billion metrics per day ○ 5 minutes SLA
  15. 15. Gameday
  16. 16. Gameday We took part to the AWS Gameday http://www.awsgameday.com/whatisgameday.html Inspired by the 2012 Obama For America DevOps and Amazon.com ops teams ● Build an Autoscaling application ● Exchange administrative IAM credentials with other team ● Break your opponent's systems ● Restore your system ● Lessons learned
  17. 17. Who is interested if we wanted to run this? It needs a full day, ~ 6 hours. Weekday? Weekend?
  18. 18. Twitter: @petemounce

×