Cloud Custodian is a rules engine for managing infrastructure resources across AWS accounts using YAML policies. It allows filtering resources, taking actions like tagging, encrypting, deleting. It integrates with Lambda and CloudWatch. Example policies provided encrypt S3 buckets, require encryption on new objects, and switch log sinks to encrypted format. The tool addresses issues with one-off scripts by providing a standardized way to author, deploy, test and manage infrastructure policies at scale across accounts.
5. A sea of policies
- fleet wide savings policies
- off hours stops for dev environments
- garbage collect ebs, elb, etc
- Detect over-provisioned resources
- numerous security policies
- Encrypt all the Things
- Access Control
- ssl ciphers
- numerous compliance policies
- tag compliance / chargeback
- current images
- backups
Source
6. Fleet Management
Across Lots of federated accounts.
Natural tendency
- One off scripts
-
But
- How are they implemented
- How are they deployed
- How are they configured
- How are they managed
- Who owns them
Software Engineering
- How are they Tested
- Are they Reviewed
Who Knows? Source
7. Cloud Custodian
•A rules engine for infrastructure management.
•YAML DSL for policies based on query
resources or subscribe to events, apply filters,
take actions.
Integrated Lambda provisioning and event
sources.
•Outputs to Amazon S3, Amazon Cloud Watch
Logs, Amazon Cloud Watch Metrics
Opensource @ https://github.
com/capitalone/cloud-custodian
- name: require-rds-encrypt-and-non-public
resource: rds
mode:
- type: cloudtrail
- events:
- CreateDBInstance
filters:
- or:
- Encrypted: false
- PubliclyAvailable: true
actions:
- type: delete
skip-snapshot: true
8. Amazon Cloud Watch Events
Features
● Powerful infrastructure observation
capabilities
● Enables “realtime” rules enforcement and
reaction with wide coverage of AWS
product APIs.
Sources
● All Cloud Trail Events (P99 @ 90s delivery
window as of April 2016)
● EC2 instance state changes (600ms)
● ASG instance membership changes
(600ms)
● Periodic Scheduling (custom)
● Custom events
9. Cloud Custodian
Resource type policies (ec2 instance,
ami, auto scale group, bucket, elb, etc).
Filter resources
Invoke actions on filtered set
Output resource json to s3, metrics to
cloudwatch
Vocabularies of actions, and filters for
policy construction.
- name: ebs-copy-instance-tags
resource: ebs
filters:
- type: value
key: "Attachments[0].Device"
value: not-null
actions:
- type: copy-instance-tags
tags:
- App
- Env
- Owner
- Name
10. Filtering resources
Generic Value filter
- Jmespath expressions on
resource’s json representation
- Lots of operator matching (in,
not-in, absent, not-null, gte,
regex, etc)
Arbitrary nesting of filters with ‘or’
and ‘and’ blocks.
Simple key/value are equality
matches with value expressions
- type: value
# Ignore keys that start with
# 'aws:' as they don't count towards the limit.
Key: "[length(Tags[?!starts_with(Key,'aws:')])][0]"
op: less-than
value: 10
- or:
- “tag:App”: absent
- “tag:Env”: absent
- and:
- Encrypted: false
11. Multi Step Workflows
“Poorly tagged
instances, should be
stopped in 1 day, and
then terminated in 3”
- mark-for-op
- marked-for-op
Chain together multiple
policies.
- name: ec2-tag-compliance-mark
resource: ec2
description: |
Find all non-compliant tag
instances for stoppage in 1 days.
mode:
type: periodic
schedule: rate(1 day)
filters:
- "tag:maid_status": absent
- or:
- "tag:App": absent
- "tag:Env": absent
- "tag:Owner": absent
actions:
- type: mark-for-op
op: stop
days: 1
- name: ec2-tag-compliance-stop
resource: ec2
description: |
Stop poorly tagged and schedule
Terminate.
mode:
type: periodic
schedule: rate(1 day)
filters:
- type: marked-for-op
op: stop
- or:
- "tag:App": absent
- "tag:Env": absent
- "tag:Owner": absent
actions:
- stop
- type: mark-for-op
op: terminate
days: 4