2. Introduction
What is Ansible?
● A configuration management system
● Agentless design: ‘controller’ (admin’s localhost) supervise everything
● No mandatory data server to work with.
● Uses ssh as a primal transport, but there are many other transports too.
5. Name of things
● Task + task + task => tasklist
● Tasks + vars + defaults => role
● Tasklist + hosts => play
● Play + play + … = playbook
● Playbooks + inventories = ansible repo (unofficial)
6. modules
● Each module configure specific thing on the host
● Examples:
○ template
○ apt
○ systemd
○ stat
○ postgresql_user
○ object_storage
○ cron
○ crm_resource
○ …
○ ~ 2200 modules in ansible 2.4
7. variables & templates
Ansible allow to use variables to pass argument to modules.
- Each variable is processed with jinja2 template engine
- Tasks can register variables, there is a set_fact module
- Each task, play and role may have own local-scoped variables
- Nested definition is OK
- Recursion is prohibited
- Variables are expanded at the moment of use (in modules and conditions)
- Dedicated templates for configs are processed the same way as variables
8. handlers
● Are called if affected task was changed
● Are called once per play
● Can be flushed (called) earlier with meta: flush_handlers
● Have a play visibility
● Roles can notify each other hander’s:
○ It’s complicated. Try to avoid this.
● Can listen to other handler’s notification
● Are called in order of declaration, not in order of notifications
● Error handling/retry policy: at most once
○ This is bad
9. handlers and includes
include_role import_role
inner
action
outer action inner + outer
action
inner action outer action inner + outer action
INNER + OUTER hander outer only outer only outer only inner only inner only inner only
INNER handler only inner NOT FOUND NOT FOUND inner inner inner
OUTER handler only outer outer outer outer outer outer
https://github.com/amarao/ansible_import_include_and_handlers
10. Conditionals
● evaluated at the moment of execution
● Evaluated on every iteration for loops
● Separately for each entry in ‘block’
● Have a special hack for ‘is defined’
11. Loops
- All of them are slow and clumsy.
- Ansible 2.5: iter_items → loops.
- Complicated branching is bad.
- Complexity is bad.
loop_control:
loop_var: user
label: ‘{{user.short_name}} at {{user_department}}’
12. idempotency
● Each task or fail, or change something, or ‘success (no change)’, or skipped
● Each task should report change only if there are changes made.
● Second run of the same task should yield ‘no change’
Important for:
- Testing
- Stability and audit
- Handler’s calls
14. What is ‘big’ means for an ansible project?
Kubespray
● 911 files
● 49132 lines
Openstack-ansible
● 1196 files
● 52504 lines
Openshift-ansible
● 1668 files
● 175745 lines
● Estimated yaml multiplicator for line count: ~x3
15. Not-a-code consequences
● Global variables everywhere
● foo: ‘{{foo + 1}}’ is officially broken. Forever.
● A practical call stack depth: 3-5
● It’s hard to change values in dictionaries and lists
● Data queries are crazy and complicated (json_query filter in Jinja2):
16. Sources of pain
● Dependencies
● Slow execution over ssh
● Memory hogging on includes (partially fixed in 2.4.3 and 2.5)
● Data query
● Rudimental modularity
● Name conflicts
● Non-typed interfaces between roles
● A horrible error reporting for jinja2 templates/filters
● Unpredictable visibility for global variables
● Variable precedence is complicated and is broken in include_role.
17. Ansible is a muscle, not a skeleton
● Everything is permitted
● Most errors are detected at runtime
○ Or even silently succeeded with incorrect behavior
● No universally accepted style guide (* try ansible-lint)
● No well-known design patterns
● Best practices are at level of elementary school
Why do we still use Ansible?
Because it’s the best we have insofar.
18. Some bones to build a skeleton
1. Execution flow: tasks and roles are assigned to hosts
2. Hosts are the first class objects to work with
3. Groups and groups inheritance to keep relations between hosts
4. Group variables
5. A simple iteration over lists
6. Transparent access to hosts ‘by ansible magic’
... I wish I this list would be longer...
20. No overengineering
It’s not java or python. Every act of overengineering bites you badly.
● Play is better than role
● Role is better than play, repeated twice in two different playbooks
● Tasklist in a role is better than a second role
● If you can join two roles through a play, use the play
○ If you can’t - use a wrapper role
● Play for host is better than delegate_to in task
● Delegate_to is better than poking into hostvars of other host
● Everytime you iterate over hosts in a group, God kills a cat
22. Project layout
Included in site.yaml
● Users and basic software
● Software installation and configuration
● Database creation
● Monitoring
Used separately:
● Bootstrap
● Update procedure
● Recovery procedure
● Helper scripts for staging
○ Copy data from production
○ Tests for recovered system
○ Creation/teardown for staging
● Inventory update/generation
23. Scope reduction
Each piece of code should work within its own domain:
If we configure application foo we shouldn’t touch random bits outside of foo:
❌ NO
● add nginx configuration for foo
● use this magic query to find
database IP
● transform list of users from global
userlist to foo format
✅ YES
● Use wrapper role to configure
nginx (include_role, import_role)
● Use role to search database IP
● Pass userlist explicitly from
playbook or another wrapper role
24. There is no the sane way to describe dependencies.
- Old style (with dependencies in meta) do not work and is been deprecating.
- New style include_role/import_role ignores meta-dependecies.
The single way to create dependency is to do it manually.
- import_role when role_foo_called is not defined
- set_fact: role_foo_called inside a role
Or, just call it twice if it’s fast.
Explicit dependencies
25. Name it! Name it right!
Examples:
● Everything should have a hyperonym (common name for few things)
○ F.e. ‘configuration playbooks’ VS ‘script playbooks’
○ Configuration playbooks should be linted to the perfection
○ Script playbooks may have unconditional ‘command/shell’ with ‘changed always’ status
● Different types of groups
○ F.e. ‘Execution groups’ VS ‘groups for variables’
○ Groups for variables should never have assigned tasks (f.e. hosts: database_settings)
● Name your components!
○ F.e. ‘bgp-push’ VS ‘bgp-pull’, ‘agents’, ‘central’, ‘external_access’, etc.
“Naming things” is the 2nd hard computing problem
27. Simple tricks
● ansible -i staging --list-hosts all
● ansible-playbook -i staging site.yaml --list-tags
○ Tags should have meaning!
● ansible-playbook -i staging site.yaml --check --diff
28. Ansible-lint !!!!!!!!!!111 one one one
● Points to subtle errors in the code playbooks
● Best practices (handlers vs “when: foo|changed” filter)
● Clarity. If lint understand that, people understand that.
● Force more semantic on shell/command
How much time it takes?
● ~ 30 lint warnings per hour.
● I cleared my project within 4 hours. There where 3 real-life bugs and 10 minor
improvements, all found by ansible-linter
29. Shell and command modules
● Main source of chaos if used inaccurately
● Rules:
○ If they gather information: changed_when: False
○ If they are idempotent: find a way to report changes.
○ If they are not idempotent: use only after query:
■ where: ‘foo’ in previous_query.stdout
■ where: previous_query.rc == 2
● You can refactor if those modules are idempotent
● You can not refactor if those modules are not idempotent
30. shell drama
And if I can’t detect changes or failure?
You are doing it wrong.
Find a way.
.
31. shell example
ip link set up command always returns 0, and never gives output.
❌ NO
- name: Link up
shell: |
ip link set up dev {{dev}}
✅ YES
- name: Check link status
command: ip link show {{dev}}
register: link_status
changed_when: False
- name: Link up
command: ip link set up dev {{dev}}
when: ‘UP’ not in link_status.stdout
32. shell example #2
foobar does not report failures at all.
We want to execute foobar add and we can to do foobar list .
❌ NO
- name: Add to foobar
shell: |
foobar add {{obj}}
✅ YES
- name: Check foobar status
register: old_fobar_output
changed_when: False
- name: Add to foobar
shell: |
foobar add {{obj}} && foobar list
register: new_foobar
when: obj not in old_foobar_output
failed_when: obj not in new_foobar
33. Apt: update_cache
Theoretical question: is it updated or not?
For practical reasons answer is: no changes
Option 1: integrate into install
- name: Install foo
become: yes
apt:
name: foo
state: {{foo_install_state}}
update_cache: {{apt_update_cache}}
cache_valid_time: {{apt_cache_valid_time}}
Option 2: use without changes
- name: Update apt cache
become: yes
apt:
update_cache: yes
cache_valid_time: {{cache_time}}
changed_when: False
35. Staging
MUST HAVE
STAGING
AT ANY COST
Staging:
● Finds your bugs before production
● Helps to refactor
● Forces you to think of modularity
36. Development environment
Primary staging:
● virtual machines or real servers. Imitate production as close as possible
Development environment(s):
● Almost like staging, but faster and with omissions
● LXC (or docker) at localhost speedup runs for ~30-50%
● Deploy containers by Ansible, drop them by ansible
● Automate rebuild
37. CI/CD
● Delegate all Ansible tasks to CI/CD server (Jenkins?)
● One job for production, one for staging
● Software updates and other workflow tasks - separate jobs
● Production should be updated only through CI/CD server
○ Keep logs
○ Keep last deployed commit* in those logs
● *Do you use git for your playbooks? You should.
● Run production ‘full ansible run’ often.
○ Make it safe. Second full run = zero changes. Mandatory to have.
● Run staging ‘full ansible run’ before production for all changes.
○ It guards production and saves your face.
38. New and reinstalled servers
Bootstrap.yaml:
● Forget old ssh keys
● Remember new ones
● Install python, ssh keys, creates users
● Install all upgrades, restart server
39. Per role tests
+ Ansible way to test roles
+ Easier to debug
- Time consuming
- No inter-role integration
- Often meaningless without a context
41. Places to hide a variable
● Inventory (host, group_name:vars)
● inventory/host_vars
● inventory/group_vars
● host_vars
● group_vars [all.yaml, group_name.yaml]
● roles/default
● roles/vars
● ‘vars:’ in any task or role
● register in any task
● import_vars
● defaults/vars of imported role
Ansible variables without supervision
42. Rules to keep sanity
● host_vars are banned anywhere except an inventory
● Roles/vars should be avoided
● Roles should avoid to expose variables to other roles in the same play(book)
○ Reduce global state, OK?
○ If they do - this is called an ‘interface’. Document it.
■ Example: search-fo-database-ip can set a variable db_ip.
● Environment-specific variables are kept in the inventory
● Project-specific variables are kept in group_vars
● Roles should use defaults for rarely changed variables
● Use local ‘vars:’ statement for task-local calculations
43. Variables and environments
Environments:
● production/
● staging/
● lab1/
Variables:
● user_list -> group_vars/all.yaml
● domain_prefix -> inventory/group_vars/all.yaml
● foo_listen_port -> group_vars/foo.yaml
● db_password ->inventory/group_vars/dbaccess.yaml
● retry_timeout ->roles/foo/default/main.yaml
Rule of thumb
You must be able to add
another environment by
creating a new inventory
(file/directory) with no
changes outside that
inventory.
44. How long to think before adding a variable
roles/foo/tasks/*.yaml (vars section for task) 5 seconds no docs
roles/foo/defaults/main.yaml 30 seconds role docs
roles/foo/tasks/*.yaml (register) 1 minute no docs
roles/foo/tasks/*.yaml (set_fact, role-internal) 1 minute no docs
group_var 10 minutes role or project docs
Inventory 30 minutes role or project docs
roles/foo/tasks/.*.yaml (set_fact, external use outside of the role) 60+ minutes role and project docs
Mandatory!
For use in a command line (ansible-playbook -e) 60+ minutes role and project docs
Mandatory!
45. Assertions and validations
- name: validating variables
Fail:
msg: "please choose scenario"
when:
- osd_group_name is defined
- osd_group_name in group_names
- not containerized_deployment
- osd_scenario == 'dummy'
From ceph-ansible
- name: Check ansible version
run_once: True
assert:
that: "ansible_version.full|version_compare('2.4','>=')"
msg: >
"You must update Ansible to at least 2.4"
delegate_to: localhost
tags:
- always
fail module with ‘when’ assert module
53. Concise tags
Including tags:
● One tag - one scenario
● --tags your_tag should either:
○ Finish successfully for a new installation
○ Finish successfully for an existing
installation
● If you have some tag for few plays in
a playbook, may be it’s better to split
it to separate playbook and use
include_playbook.
Excluding tags:
● Should be used with --skip-tags
● For long or complicated operations
only.
● Each ‘always’ tag should have
additional tag for skip:
- debug: var=foo
tags:
- always
- debug_foo
54. tag examples
- apt (all operations with apt, in all roles)
- registrations (all operations with registration in a project API, in all roles)
- foo_upgrade (all apt operations to install components of foo project)
- git (all operations related to git pull/clone)
- ip (all operations related to adding/removing IP addresses on server)
- discovery ( all ‘search-for-*-ip’ roles)
- services (tasks to configure shinken services, ~80 of them, shinken only)
- drop (specific for copy-database.yaml, tasks to drop database)
56. To limit or not to limit?
Line in a template:
allow_ip = {% for h in group.all %} {{(hostvars[h]).ansible_default_ipv4.address}} {% endfor %}
ansible-playbook -i inventory test.yaml ✅
ansible-playbook -i inventory test.yaml --limit host1 ❌
fatal: [host2]: FAILED! => {"changed": false, "msg": "dict has no element ansible_default_ipv4"}
57. Solutions
We need information about all hosts, but we have used --limit
1. Forbid to use limits in project 😟
2. Write a partial content 😓
3. Lineinfile on per-host basis 😦
4. Gather facts for all hosts forcefully 😥
5. Use fact cache 😕
6. Use external database 😖
7. Skip task if not a full run 🤔
58. Partial content
{% for h in group.all %}
{% if (hostvars[h]).ansible_default_ipv4 is defined %}
{{(hostvars[h]).ansible_default_ipv4.address}}
{% endfor %}
{% endfor %}
Good: none
Bad:
- incomplete config
- ‘changed’ for each time with different --limit❌
59. Lineinfile
- name: Add host to config
lineinfile: path=/etc/foo.conf line=”host {{(hostvars[item]).ansible_default_ipv4.address}}”
when: (hostvars[item]).ansible_default_ipv4 is defined
with_items: groups.all
Good: survive --limit with no changes or broken config
Bad: old values are not removed
Note: Can be used only if config use one IP per line
60. Forceful fact gathering
- setup: subset=network
delegate_to: {{item}}
delegate_facts: yes
with_items: groups.all
when: (hostvars[item]).ansible_default_ipv4 is not defined
tags:
- always
- gather_facts
Good:
- no random ‘changed’
- Always full config
- remove old values
- fast (see ‘when’ part)
Bad:
- fails if any host is down or is not provisioned yet
61. Fact cache
● Do as in forceful fact gathering
● Set fact caching in ansible.cfg
● Hope it will be there
Good:
- Works most of the time
Bad:
always - most = bugs sometime
62. External database
● Register each host in etcd/consul
● Query data on each run
Good:
Works with --limit
Bad:
External service dependency (down/provision)
Removal of the old entities is a problem
63. Skip if not full run
- name: Configure foo
template: src=foo.conf.j2 dest=/etc/foo.conf
when: full_run
vars:
full_run: '{{play_hosts == groups.all}}'
Good:
- Works perfectly with --limit
- Won’t fail if some host is down and --limit was used
- Fast
- Updates and removes old data as needed on each full run
Bad:
- Does not update config if --limit
✅
65. Template & task relationship
● Keep templates as simple as possible
● Use ‘vars:’ section for explicit variable declaration
● Never use global variables in a template. Exceptions:
○ Iterations over all hosts
○ Ansible built-in variables
○ A special global variable documented in a project and in a role
○ Very complicated queries. Use comments in the task to list used
variables inside the template.
66. Simplify
If a template is small, use ‘copy’ with ‘content’ argument to
inline it
- template:
dest: /etc/foobar.conf
content: |
source_ip = {{ansible_default_ipv4.address}}
68. Debugging templates: Jinja2
Explicit templatization in a separate playbook (f.e. temp.yaml)
- template:
src=roles/somerole/templates/foo.conf.j2
dest: /tmp/foo.conf
delegate_to: localhost
transport: local
vars:
- some_var
- another_var
69. Templates everywhere
You don’t need to use ‘template’ to use jinja2. Every variable is a {{template}}.
- copy
- lineinfile
- blockinfile
- all file names for all copy/stat/file modules
- arguments to shell and command modules
- all other modules (apt, postgres_user, etc)
72. Roles: structure
1. Use defaults for rarely changed values. Do not use hard-coded constants.
2. Split role in parts
3. Allow to call role parts independently
4. Allow to reuse part of the role
5. Use call caching
Nginx: install + configure site
roles/nginx/tasks/main.yaml:
- import_tasklist: install.yaml
- import_tasklist: configure_site.yaml
- import_role:
name: nginx
tasks_from:
configure_site.yaml
vars:
nginx_site: ...
- name : install nginx
apt: name=nginx state=installed
when: nginx_installed is not defined
register: nginx_installed
73. Files in roles: vendor in role
Good:
- Easy to do: file: src=myfile dest=/var/lib/foo/myfile
- Single authority
- Versions
Bad:
- Keep golden artifacts in the ansible repo
74. Files in roles: external source
Good:
- A tidy git.
Bad:
- Need external storage.
- Version control.
Examples
private apt repo || private git repo || swift container (bad!)
75. Wrapper role
We have application server foo which should reside behind nginx.
● Foo want database IP, port address to listen
● Nginx need port to proxy_pass, domain, and ssl settings
Role foo configure foo only.
Role nginx configure any nginx site and it needs bunch of additional variables.
Wrapper role glues them together, but does not change anything in foo or nginx.
77. Include_role VS import_role
import_role:
- Make it like it was written in the place of ‘include’.
- Can override handlers
- Defaults are respected
(imported role use own default, but does not change parents defaults)
- Does not support loops
- Supports conditions:
- A condition is applied to each task in the import_role role.
78. Include_role VS import_role
include_role:
- Supports loops
- Absolute mess
- Broken in each new ansible release in a new way (hello, 2.5):
- Delegation
- Handlers
- Defaults vs set_fact
- Parent’s variable access
- include_tasks is much more reasonable, but requires more files and lines.
79. A proper looping with an include in a role
- name: Loop over something
Include_tasks: per_something.yaml
with_items: ‘{{something}}’
- Name: in per_something.yaml
import_role: name=foo
vars:
var1: ‘{{item}}’
- name: A task in role ‘foo’
foo: arg=var1
delegate_to:
Works in ansible 2.5!
84. Plugin types
module ≠ plugin
- lookup_plugins/
- Load data from external sources
- Perform calculations and queries
- Iterate
- action_plugins/
- Do stuff on hosts
- vars_plugins
- inventory_plugins
All plugins are written in Python, and can be stored in ‘*_plugins/’ directory near a
playbook, or within a role.
85. Lookup plugins
1. Try to do it with ansible.
2. Try to do it with in-line jinja2 template
3. Try to do it with in-line json_query
4. Try to do it with external jinja2_template
5. If not, write a plugin
Rule of thumb: if jinja2 template more then ⅓ of plugin (and it’s tests), write a
plugin. If less, use a jinja2.
Python in ansible complicates reading! A lot.
Plugin without tests is worse then jinja2 of any complexity.
86. Lookup plugins: an example
from __future__ import (absolute_import, division, print_function)
__metaclass__ = type
from ansible.plugins.lookup import LookupBase
import copy
class LookupModule(LookupBase):
def run(self, terms, **kwargs):
data = terms or kwargs
assigned_something = data['assigned_something']
assigned_others = data['assigned_others']
somethings = data['somethings']
foo_source_ips = []
for something in somethings:
for data in something.get('datas', []):
if data['other'] in assigned_others:
foo_source_ips.append(data['foo_source_ip'])
return foo_source_ips
87. Lookup plugins: an example
- name: Register IP
Uri:
method: PUT
url: ‘{{url}}’
body_format: json
body: '{"something": "{{item["something"]}}","other": "{{item["other"]”[data"]}}}"}'
Status_code:
- 200
- 201
- 304
register: reg_status
changed_when: reg_status.status in [200, 201]
with_my_custom_filter: '{{something}}'
88. Lookup plugins: json_query equivalent
- name: looping over
include_tasks: process_other.yaml
with_items: '{{selected_datas}}'
Loop_control:
loop_var: data
label: '{{other}} @ {{data.foo_source_ip|default("no ip")}}'
when: data.foo_source_ip is defined and data.other in assigned_others
vars:
somethings: '{{global_config["somethings"]}}'
query: "[?name=='{{assigned_something}}'].datas"
selected_datas: '{{global_config.somethings|json_query(query)}}'
foo_source_ip: '{{data.foo_source_ip}}'
something: '{{assigned_something}}'
other: '{{data.other}}'
89. Other plugins
I have no experience with them, sorry.
Key ideas for action plugins, when to write them:
- Too many too complicated command/shells in a playbook/role
- Needed reusability
- Better test coverage
- Complicated data types in use
92. Refactoring when adding features
● Use small steps
● Write a plan for refactoring before changing anything
● Paper drawing is advised.
● Use ‘not changed’ status to see if refactoring does not change anything
● Use ansible-playbook --check --diff
● Do two steps refactoring:
○ Change internals without changes in the result
○ Do small, simple changes which to change the result
● Do not forget to add cleanup code if needed
○ Drop it later
● Each step should have separate commit with a multi-line description
○ You can do this, I believe in you!
93. Refactoring when cleaning up mess
- Find scenarios for execution
- Eliminate false ‘changed’
- Reduce spread between files (no hostvars!)
- Split plays into playbooks
- Split tasklist into roles
- Replace hardcoded values with variables
- In templates too!
- Do you remember about staging?
- Reduce complexity of queries and iterations
- Replace ‘shell/command’ with modules
- Ansible-lint
94. Refactoring example: Scraps from my table
● Write all ideas, even
discarded.
● Write all variables and file
names you’ve introduced or
changed
● Draw arrows between objects
95. THE END
Final advice:
● Every role and every playbook cut the corners.
● Cut as few corners as possible.
● Each ‘cut corner’ has consequences.
● Amount of time dedicated to a role or to a playbook is a function of it’s
importance.
Be safe, be reasonable, and let ansible-lint to be with you.
Notes de l'éditeur
- about ansible, pre 2.0, bad 2.3, 2.4, small revolution at 2.5
- about my experience
- expectation on audience. Someone knew some things better than me
- some I stole from others, some are my own inventions
Not in this presentation: vault, tower, network
Why it’s simple
Why it’s complicated
A play or a playbook can not be in a role!
Few examples here, they cover almost everything.
Origin of Jinja
Explain ‘moment of usage’
Will explain ‘at least once’ VS ‘at most once’
- delegate_to/include/loop will be explained later
2.5 - just a cosmetics
It’s bad. Too many places, too many ways of thinking
Why so many on tags? Because tags are usefull, but ansible gives no hint on how to use them and when to stop.I wanted to give counterexamples, but they are hard to show because it’s hard to show inconsistency on a short slide
It’s should be in refactoring part too. Pay attention to this.
It doesn’t matter what this photo is about. Key is a spirit - what to do.
There are many object and their relationship is compicated. Draw it.