2. About me
• Senior software engineer and architect
• Founder & CEO @ LogSentinel
• Blog: techblog.bozho.net
• Twitter: @bozhobg
• Stackoverflow top 50
3. Why?
• Why high availability?
• Why scalability?
• To account for increased load
• If you have decent HA, you’re likely scalable
• Don’t overdesign
• Why AWS (or any cloud provider)?
4. AWS
• IaaS (Infrastructure as a service) originally (EC2)
• Virtual machines
• Load balancers
• Security groups
• PaaS services ontop
• Multiple regions – US, EU, Asia, etc.
• Each region has multiple availability zones (roughly equal to “data centers”)
• Cross-availability zone is easy
• Cross-region is harder
• Similar to Azure, Google Cloud, etc.
5. Rule of thumb: stateless applications
• No persistent state on the application nodes
• Caches and temporary files are okay
• Distributed vs local cache
• Session state: distributed vs no session state (e.g. JWT)
• Makes the application layer horizontally scalable
• Application nodes are disposable
6. Executing only once in a cluster
• Sometimes you need to execute a scheduled piece of code only once in a cluster
• Database-backed schedule job management
• Distributed locks (Hazelcast)
• Using queues (SQS, AmazonMQ, RabbitMQ)?
7. Scaling
• Autoscaling groups
• Groups of virtual machines (instances) with identical configuration
• Scale-up - configure criteria for launching new virtual machines – e.g. “more than 5
minutes of CPU utilization over 80%”
• Scale-down – configure criteria for destroying virtual machines
• Allows for handling spikes, or gradual increase of load
• Spot instances
• Cheap instances you “bid” for. Can be reclaimed at any time
• Useful for heavy background processes.
• Useful for test environments.
8. Data stores
• Managed
• RDBMS (AWS RDS) – MySQL, MariaDB, Postgres, Oracle, MS SQL
• Search engines – Elasticsearch
• Caches – Elasticache (Redis and memcached)
• Custom:
• Amazon Aurora
• CloudSearch
• S3, SimpleDB, Dynamo
• Own installation: spin VMs, install anything you like (e.g. Cassandra, Hbase, own
Postgres, own Elasticsearch, own caching solution)
9. Scaling data stores
• The custom ones are automatically scaled (S3, SimpleDB)
• The managed ones are scaled by configuration
• Own deployments are scaled via auto-scaling groups
• Data sharding vs replication with consistent hashing
• Resharding is not trivial
• Replication with consistent hashing can handle scaling up automatically *
10. Elastic load balancer
• AWS-provided software load balancer
• Points to specified target machines or group of machines (roughly ASGs)
• Configurable: protocols, ports, healthcheck, monitoring metrics
• TLS termination
• AWS-managed certificates
• Load balancer in front of application nodes
• Load balancer in front of data store nodes
• vs application-level load-balancing (configuration vs fetching db nodes dynamically)
11. Things to automate
• Hardware and network resources (CloudFormation)
• Application and database configuration (OpsWorks: Puppet, Chef, S3+bash, Capistrano)
• Instances
• launch configurations + bash
• docker containers + bash (Elastic Container Service vs Fargate, Kubernetes)
• Why automate?
• because autoscaling benefits from automated instance creation
12. Scripted stacks
• You can create all instances, load balancers, auto-scaling groups, launch configurations,
security groups, domains, elasticsearch domains, etc., etc.. manually
• But CloudFormation is way better
• JSON or YAML
• CloudFormation manages upgrade
• Stack parameters (instance types, number of nodes, domains used, s3 buckets, etc.)
16. Why CloudFormation?
• Replicable stacks
• Used for different customers
• Used for different environments
• Used for disaster recovery
• Having a clear documentation of your entire infrastructure
• DevOps friendly
• Not that hard to learn
• Drawbacks: slow change-and-test cycles, proprietary
• Alternatives: Terraform
• Tries to abstract stack creation independent of provider, but you still depend on
proprietary concepts like ELB, security groups, etc.
17. Configuration provisioning
• OpsWorks – hosted Puppet or Chef
• Capistrano – “login to all machines and do x, y, z”
• S3 – simple, no learning curve
• Instance launch configuration includes files to fetch from S3 (app.properties,
db.properties, cassandra.conf, mysql.conf, etc.)
• CloudFormation can write dynamic values to conf files (e.g. ELB address)
19. Automated instance setup
• Elastic Container Services
• Deploy docker containers on EC2 instances
• Fargate abstracts the need to manage the underlying EC2 instance
• Kubernetes – vendor-independent
• But don’t rush into using kubernetes (or Docker for that matter).
• Packer – creates images
• Manual
• Launch configuration to fetch and execute setup.sh
• Allows for easy zero downtime blue-green deployment
• Instance setup changed? Destroy the it and launch a new one
• Simple. Simple is good.
20. Blue-green deployment
• Two S3 “folders” – blue and green
• Shared database
• Two autoscaling groups – blue (currently active) and green (currently passive)
• Upload new release artifact (e.g. fat jar) to s3://setup-bucket/green
• Activate the green ASG (increase required number of instances)
• Wait for nodes to launch
• Execute acceptance tests
• Switch DNS record (Route53) from blue ELB to green ELB
• Turquoise (intermediate deployment in case of breaking database changes)
• Can be automated via script that uses AWS CLI or APIs
21. Other useful services
• IAM – user and role management (each instance knows its role, no need for passwords)
• S3 – distributed storage / key-value store / universally applicable
• CloudTrail – audit trail of all infrastructure changes
• CloudWatch – monitoring of resources
• KMS – key management
• Glacier – cold storage
• Lambda – “serverless” a.k.a. function execution
22. General best practices
• Security groups
• Only open ports that you need
• Bastion host – entry point to the stack via SSH
• VPC (virtual private cloud)
• your own virtual network, private address space, subnets (per e.g. availability zone),
etc.
• Multi-factor authentication
23. Conclusion
• Scalability is a function of your application first and infrastructure second
• AWS is pretty straightforward to learn
• You can have scalable, scripted infrastructure without big investments
• New services appear often – check them out
• Vendor lock-in is almost inevitable
• But concepts are (almost) identical across cloud providers
• If something can be done easily without an AWS-specific service, prefer that
• Bash is inevitable