Adobe has quickly scaled from nothing to a huge presence in the AWS cloud.
This is the story from the trenches: how we screwed up, learned and evolved our use of Chef to help get us to today. Taming Chef to work in the AWS cloud while trying to build a platform at a large scale was not as easy as we originally planned, and we’re consistently trying to make it better. We’ll share some tips and tricks from our experience.
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Zero to Production in Crazy Time: Adobe’s Transformation
1. Zero to Prod in Crazy Time
John Martinez | Adobe Cloud Services
2. About Me
• Currently working as a Cloud Operations Engineer at Adobe
• I get to figure out new stuff, and make really old stuff work in AWS
• 20+ years doing UNIX/Linux work
• Learned about cloud computing at Netflix
• Working at Adobe feeds my habit - photography
5. How We Got Started
• Creative Cloud went live in late April 2012
• AWS from the start
• We needed to do SOMETHING
• Yes, it was really that scientific of a decision
• Chef vs. Puppet
• That learning curve
6. #EPICFAIL #1
• Not socializing the need for Chef to the dev team
• Once sold, keep momentum going
• The “let’s make this more complicated than it needs to be syndrome”
• Start with easy stuff first, then graduate
• Ops guy admits: the dev people know how to use software
engineering methods for creating and maintaining infrastructure code:
USE IT
7. Tweaking Knobs
• EC2 AMIs: bake or configure?
• Baking positive: fast boot times
• Baking negative: too static
• Configure positive: very dynamic
• Configure negative: can take forever to boot
• We settled on a mostly dynamic configuration, with some static baking
• knife-ec2 is great, but what about autoscale?
• The CloudFormation connection
8. #EPICFAIL #2
• Get Chef, don’t actually use it
• Back to that learning curve (Hint:Training)
• Issue with compressed timelines and small staff
• In the heat of deploying prod, doing stupid things
• Losing track of what got deployed where
• Who’s doing what?
• Not sleeping sucks
9. Out of the Rubble
• Now that we’re live: refactor time (a.k.a. Fix all the broken stuff)
• Chef development for reals
• OMG:WINDOWS?!?!
• Not a lot of expertise in-house or outside
• Ops guy admits: learned to love dev tools like Jenkins and Git
10. It’s Alive!
• Did gradually over time
• Started with simple recipes, graduated to more complicated ones
• Using Environments to deploy the right thing in the right place
• It’s AWS stupid: you SHOULD kill your instances
• CloudFormation to AutoScale to Chef Client
11. It’s Alive (v1)
EC2
Instances
S3 Bucket
(validator
key)
Cloud
Formation
Auto
Scale
Group
Hosted
1
1. knife upload
Cookbooks
Environment
Roles
Data bags
2 3
4
0
0. Manual
Editor (vi)
Perforce
cfn-create-stack
4. Chef Client
Bootstrap
Data Bag Key
Recipes
12. More Automation (v2)
EC2
Instances
S3 Bucket
(validator
key)
Cloud
Formation
Auto
Scale
Group
Hosted
1
1. knife upload
Cookbooks
Environment
Roles
Data bags
2 3
4
0
0. Automated
Git
Jenkins
Jenkins CFN
4. Chef Client
Bootstrap
Data Bag Key
Recipes
13. On Bootstrapping EC2 Instances
• Biggest issue with Chef in AWS: straying from knife-ec2
• Read the bootstrap document and reverse engineer it
• http://wiki.opscode.com/display/chef/Client+Bootstrap+Fast+Start+Guide
• http://wiki.opscode.com/display/chef/EC2+Bootstrap+Fast+Start+Guide
• user-data is your friend
• Use it for node identity
• Resist the devil: don’t send any API keys or passwords or embarrassing things via user-data!!!
• Windows works this way, too, but learn PowerShell
15. #EPICFAIL #3
• Failing to architect for failure (double BAM)
• Even though we built a hot AWS architecture, we still got bit
• What does it mean when Hosted Chef is down for us?
• Talk to Opscode...really, talk to them, they want to help
16. How We’re Trying to Improve
• Mostly around availability
• Augment Hosted Chef with Private Chef
• Mostly around security
• Use the tools at your disposal
• IAM policies for EC2 roles and S3 bucket security
• Mostly around performance
• Refactoring AWS-related code to use AWS SDK for Ruby
• AMI factory from base Amazon Linux or Ubuntu AMIs (bonus points for Windows)
17. The End
• Operational scripts, template examples and other bits
• https://github.com/Adobe-CloudOps
• Contact me:
• @johnmartinez
• martinez@adobe.com
• Questions? Suggestions? Come talk to me after!