2. • Scaleable web capacity
• Scaleable load balancer capacity
• Scalable service capacity
• Scaleable, repeatable, self service Elasticsearch and Cassandra
• Provisioning a new prod datacenter cut from 12+ months to 2.
• Auto scaling, using spot capacity
• Operations and Infrastructure teams:
• More efficient
• More agile
• Providing more business value
2
Key Results
6. • Wrap Terraform up to make it harder to screw up
• Makefile - easiest path - make plan / make apply
• Use allowed_account_ids in providers (AWS specifically)
6
Lesson 1:
9. • My small contributions to the community
• github.com/terraform-community-modules/tf_aws_ubuntu_ami
• github.com/terraform-community-modules/
tf_aws_availability_zones
• Modules you can reuse to stop having to hard code IDs
• make + getvariables.rb pattern
• We have internal versions at Yelp (usually to bake in our
variables.tf.json)
9
I hate magic numbers
10. • Standup update from a coworker:
• Yesterday: “Learned Go”
• Today: “Implemented yelpaws_instance”
• Adds “ubuntu” and “region” + “account” variables to aws_instance
• Looks up the AMI to use automatically
• Only on initial launch, puppet converges machines after that!
• https://github.com/Yelp/terraform-ami_fromhttp
10
yelpaws_instance
12. • Modules
• Don’t put your modules in a ../modules folder in the same repos.
• Make them separate repositories, and lock SHAs/tags to avoid
surprises!
• Don’t deeply nest modules - pass a module everything it needs
• Code
• type/region-environment layout
• vpc/uswest1-prod/subnets.tf
• web_frontend/uswest1-prod/webs.tf
• terraform.tfvars
12
Code layout
13. • Build your VPC, subnets etc with terraform
• Export as remote state
• Pull in elsewhere - eliminate magic numbers
• Much nicer solution than getvariables.rb
13
Remote state
15. • nsone is an awesome DNS service!
• They have a fantastic API
• I wrote my own Terraform provider!
• github.com/bobtfish/terraform-provider-nsone
• Tie together resources from multiple regions using remote state!
15
nsone
18. • Puppet code:
class { ‘role::elasticsearch_cluster’:
cluster_name => ’reviews’,
}
• Hiera lookups:
puppet/modules/elasticsearch_cluster/data/cluster/
reviews.yaml
• Can locate the ‘data’ directory somewhere else
18
puppet data as modules
19. • Spot fleet Terraform provider in use internally
• ‘Coming soon’ to github
19
Spot fleet
20. • puppet/modules/elasticsearch_clusters/data/cluster/reviews.yaml
• Move the cluster data folder out of puppet
• Add YAML for mapping of region/environment/number of nodes
• Generate terraform config (as JSON)
• Simple config
• Directly creating ASGs
• No modules
• Easy to debug!
• Automated cluster provisioning! (Just add Jenkins)
20
Managing Elasticsearch/Cassandra etc
21. • Bad abstraction for contextual information
• Which db server is the master? Does it have ‘master’ in it’s FQDN?
• If it does, what happens when you promote another machine?
• Need key => value for cattle not pets
• Customize your monitoring system to actually tell you what’s wrong!
• ‘The master DB has crashed’ vs ‘A db has crashed’
• ‘10-46-11-54 is dead’ vs ‘zookeeper::10-46-11-54 is dead`
21
Hostnames
22. • Smartstack
• Nerve (on host, monitors services)
• Synapse (run a haproxy on lo:0)
• Hacheck (cache healthcheck results to rate limit)
• qdisc_tools (seamless haproxy reloads)
• yocalhost: 169.254.255.254
• Reachable from the machine
• Reachable from inside Docker
• Each service has a fixed port
22
Service discovery
23. • Terraform is really, really young.
• It has some serious issues and limitations currently
23
The bad news
24. • Terraform is really, really young.
• It has some serious issues and limitations currently
24
The bad news
The good news
• It’s moving really fast
• None of the things needed fundamentally change the model
25. • Unfortunately, provider aliases don’t work in terraform modules
• We want to provision all ‘prod’ ES clusters in one shot
• So we just generate raw terraform resources, without using a module
• Works, but it’d be nice to have more separation
• ‘Make all the Elasticsearch clusters’
• ‘Make an individual Elasticsearch cluster’
• Should be separate concerns IMO
25
Multi region
26. • "Terraform is really hard to debug”
• Modules make this 10x worse.
• TF_LOG=1 is useful for provider authors.
• NOT useful for Terraform users
26
Debugging
27. output “thing_ids” {
value = “${join(“,”, aws_instance.foo.*.id)}”
}
${split(“,” module.foo.thing)}
const stringListDelim = `B780FFEC-B661-4EB8-9236-
A01737AD98B6`
27
Data structures
28. • Lots of corner cases where they don’t work.
• Some cases where they work sometimes
28
Counts and Interpolation
29. • Don’t try to put your domain logic into Terraform!
• Write some (simple!) classes for your domain
• Make them serialize out to Terraform resources in JSON
• Done!
29
KISS
30. • 0.7 will fix some of my biggest complaints
• Ability to move state
• Enables refactoring existing resources into modules
• Complex data structure support
• No more split() join()
30
Terraform 0.7