When designing a new infrastructure, weaving configuration management within it is a natural solution nowadays.
However, there are many systems in the wild that are still manually managed, if managed at all; mission critical servers that can’t be shut down, systems that runs proprietary software which depend on out-of-date databases, … They may even have been configured using forgotten conventions (that can be different on different iteration of systems).
Using configuration automation tools on these system can seem like an impossible task, but it is not, and the efforts are really worth the benefits.
This talk will present feedback from a couple of projects I’ve worked on, describing how to manage these “existing, manual and critical” systems automatically, most specifically the reverse engineering of existing systems (compiling all documents, inventorying systems, devising the rules, auditing deviations), and the steps to managing them automatically.
2. Normation – CC-BY-SA
normation.com
Issue
Most systems are still not automatically managed
● Configuration Management has recently become mainstream
● It's not yet an habit
● A lot of running systems predate configuration management
● Lack of upgrade paths (dependency to dead applications)
● Systems cannot be modified (lost knowledge)
● Systems with stale errors no-one can fix
3. Normation – CC-BY-SA
normation.com
Issue
Most systems are still not automatically managed
● Configuration Management has recently become mainstream
● It's not yet an habit
● A lot of running systems predate configuration management
● Lack of upgrade paths (dependency to dead applications)
● Systems cannot be modified (lost knowledge)
● Systems with stale errors no-one can fix
Why couldn't we benefit from cfgmgmt on these systems?
4. Normation – CC-BY-SA
normation.com
Why Rudder?
Rudder is very well suited for this use-case
● Support a lot of different OSes and heterogeneous systems
● Audit mode
● Web Interface
● API to add and extract data
5. Normation – CC-BY-SA
normation.com
Identifying systems
First, identify the systems and their role(s)
● It can be harder than expected
● Some systems may be known only by sub-parts of the team
● Roles may be unknown from most
● Select those in scope for cfgmgmt
● Having an up-to-date CMDB, Wiki, spreadsheet… helps a lot
Make a list of these systems
● In a spreadsheet
7. Normation – CC-BY-SA
normation.com
Inventory systems
Make an inventory of all theses systems
● During maintenance windows, install Rudder agent
● Inventory will be sent to Rudder server
● Extract them with the API into the spreadsheet
● Set these nodes in Audit mode in Rudder
● Validate the roles
● Based on installed software and running processes
● Based on naming convention, networks
● Based on previous knowledge (expectation may not match reality)
9. Normation – CC-BY-SA
normation.com
Group the systems
Multidimensional approach for grouping systems
● Per roles
● Nodes with same role ought to have 'identical' config
● Per security level
● Hardening, access rules, authorizations
● Per generation of system installation
● Installation procedures, best practices and know-how evolved over
time
● Per OS
● Per system type (physical server, embedded device, ...)
10. Normation – CC-BY-SA
normation.com
Group the systems
Extract common rules
● Based on documented procedures, available know-how, expectations
● List them in the spreadsheet, with
● Detailed Description
● Groups they should apply to
● Status in Rudder: implemented and compliant
12. Normation – CC-BY-SA
normation.com
Audit the rules
Configure the Rules and Directives in Rudder
● Use same names in Spreadsheet and in Rudder
● Rules and Directives in Audit mode
● Get compliance result
● Extract data using the API
15. Normation – CC-BY-SA
normation.com
Non compliance
For every non-compliance listed
● Is it expected?
● Should it be remediated?
● Yes, and it's straightforward – switch from Audit mode to Enforce
● May need to split in two Rules: one in Audit mode, one in
Enforce, and switch nodes from one Rule to another during
each maintenance windows
● Yes, but need to be done manually – correct manually on the
node during maintenance windows
● Yes, but risky: assess the expected risk/benefits
● Maybe some exceptions will be implemented
16. Normation – CC-BY-SA
normation.com
Validation
Validate your rules
● Spawn new systems (at least one per group)
● Check they become fully functional
● Detect rogue “live” parameters (like sysctl modified by hand)
● Ensure repeatability
17. Normation – CC-BY-SA
normation.com
Time estimate
Rough time estimates
● Identify systems: several hours per team members
● You may need to interview all teams members.
● Hidden benefit: explain to all of them the goal, and boost
acceptation of configuration management
● Agents install: 10 minutes to 1 hour per batch
● Deploy repository for each site, remote install, get inventories
● Role validation: minutes to days per role
● Review procedures, check what is on systems
● Logical system grouping:
● Depends on number of roles, exceptions, generations.
18. Normation – CC-BY-SA
normation.com
Time estimate
Rough time estimates
● Create spreadsheet: 4h to several days
● Depends on your skill, and amount of data to store there
● Rule creation:
● Couple of minutes to hours depending on complexity
● Measure compliance: 5 minutes – hours per rule
● Check what is not compliant, and document it
● Remediation plan:
● Very fast to “rewrite a procedure from scratch”
● Expect surprise
● Discover forgotten systems
● Discover major compliance issues
19. Normation – CC-BY-SA
normation.com
Time estimate
There will be delays
● Deal with maintenance windows
● Deal with freeze (August in France, December)
● Decisions on non-compliance remediation are not always easy
● Need to involve stakeholders
20. Normation – CC-BY-SA
normation.com
What are the benefits?
Standard configuration management benefits
● Awarness on the IT
● Improved reliability
● Improved productivity
21. Normation – CC-BY-SA
normation.com
What are the benefits?
More specific to this case
● Less outages due to stale errors
● Less outages thanks to uniformity
● Improved RTO
● Reduced surface of vulnerability
● A base to evolve your IT