Technical overview of how SUSE OpenStack Cloud uses Chef to implement highly available OpenStack infrastructure services.
Target audience: curious developers in the upstream openstack-chef community
These slides were extracted from internal HA training for SUSE OpenStack Cloud developers, and slightly modified for the benefit of the openstack‐chef community.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Chef cookbooks for OpenStack HA
1. Adam Spiers
Senior Software Engineer
aspiers@suse.com
SUSE® OpenStack Cloud
Chef cookbooks for HA
Technical overview
for curious upstream #openstack-chef developers
2. 2
Agenda
These slides were extracted from internal HA training for SUSE
OpenStack Cloud developers, and slightly modified for the
benefit of the upstream #openstack chef‑ community.
• barclamp-pacemaker
• Synchronization
• Maintenance mode
• HA-enabled barclamps
Tip: some handy
hyperlinks in this deck!
3. 3
barclamp-pacemaker
• SUSE OpenStack Cloud uses the Crowbar deployment
framework, which is extensible via plugins which are
called “barclamps”
• The core of the HA functionality is provided via the
Pacemaker barclamp, which:
‒ exposes cluster membership/configuration options via Crowbar UI
‒ sets up the bare cluster and related components
‒ provides Chef cookbooks so other barclamps (Keystone,
Glance etc.) can make their own services HA
• This barclamp is mature, heavily tested, and deployed in
many production OpenStack clouds around the world.
5. 5
corosync cookbook
• Completely independent of Crowbar
‒ TODO: desperately needs to be upstreamed
• Under chef/cookbooks/corosync/
• Configures /etc/corosync/
‒ including authkey generation / propagation
‒ Founder node generates it
‒ Other nodes get a copy
• Contains fail-safe cluster startup logic (e.g. to prevent
STONITH loops)
6. 6
pacemaker cookbook
• The heart of the barclamp!
• Under chef/cookbooks/pacemaker/
• Completely independent of Crowbar
‒ TODO: upstreaming desperately needs to be finished!
‒ already used git subtree to export subdirectory to
https://github.com/stackforge/cookbook-pacemaker
‒ need to document properly
‒ need to set up Travis CI
‒ automate propagation of changes between repos via ci.opensuse.org Jenkins
instance?
• Depends on corosync cookbook
• Important code, so let's look inside ...
7. 7
pacemaker cookbook internals
Two parallel sets of code:
1. Pacemaker::CIBObject class hierarchy
● Takes care of communicating with Pacemaker via crm(8)
2. LWRPs for cluster resources
● Makes it really easy to write recipes which create / manage
cluster resources
● Back-end provider uses Pacemaker::CIBObject class
hierarchy
Both sets of code have comprehensive unit test suites!
8. 8
Pacemaker::CIBObject hierarchy
• Class hierarchy under libraries/pacemaker*
• Independent of Chef
‒ TODO: should be spun out into a separate gem!
• Pacemaker::CIBObject
‒ Pacemaker::Resource
‒ Pacemaker::Resource::Primitive
‒ Pacemaker::Resource::Clone etc.
‒ Pacemaker::Constraint
‒ Pacemaker::Constraint::Location
‒ Pacemaker::Constraint::Order etc.
9. 9
LWRPs for cluster resources
• Under resources/ and providers/
• pacemaker_primitive, pacemaker_clone etc.
• Has to re-use code via mixins, because LWRPs don't
support inheritance :-/
• With hindsight, should have used
https://github.com/poise/poise or at least written as a
HWRP :-/
10. 10
Example usage of LWRPs
service_name = "keystone"
pacemaker_primitive service_name do
agent node[:keystone][:ha][:agent] # "lsb:openstack-keystone"
# If we used the OCF RA instead of the LSB init script:
# params ({
# "os_auth_url" => node[:keystone][:api][:admin_auth_URL],
# "os_tenant_name" => monitor_creds[:tenant],
# "os_username" => monitor_creds[:username],
# "os_password" => monitor_creds[:password],
# "user" => node[:keystone][:user]
# })
op node[:keystone][:ha][:op] # { :monitor => { :interval => “10s” } }
action :create
end
pacemaker_clone "cl-#{service_name}" do
rsc service_name
action [:create, :start]
end
12. 12
crowbar-pacemaker cookbook
• Crowbar-specific code
• Under chef/cookbooks/crowbar-pacemaker/
• LWRPs (under resources/ and providers/)
‒ service (covered next)
‒ sync_mark (more detail later)
‒ drbd and drbd_create_internal
• Recipes:
‒ maintenance-mode (more detail later)
‒ apache, drbd, haproxy, stonith
• Libraries
‒ Various helpers (more detail later)
13. 13
Chef::Provider::CrowbarPacemakerService
• Alternative provider for HA-enabled service
resources
• Ensures that all service management operations
(start, stop, restart, reload) are handled safely
with respect to Pacemaker
• Was really hard to get this right!!
‒ 119 lines of comments for 92 lines of code
• Despite complexity, goal was ease of use
14. 14
Using C::P::CrowbarPacemakerService
service "keystone" do
service_name node[:keystone][:service_name]
supports :status => true, :start => true,
:restart => true
action [ :enable, :start ]
...
if ha_enabled
provider Chef::Provider::CrowbarPacemakerService
end
end
15. 15
C::P::CrowbarPacemakerService implementation
• start / stop
‒ always ignored (handled by pacemaker_* LWRP)
• enable / disable
‒ both always translate to disable
• reload
‒ proxied to original service resource iff service is running
• restart
‒ puts node in maintenance mode then restarts
16. 16
Maintenance mode
• Goal: make it safe to restart a service on a single node
without confusing the whole cluster
• Pacemaker provides per-node maintenance mode for exactly
this
‒ (not to be confused with per-resource maintenance mode, which is
completely different)
• Degrades cluster
‒ need to minimise time spent in maintenance mode
• Multiple resources within one chef-client run might need
maintenance mode
‒ but don't want mode to flip-flop a lot
17. 17
How does maintenance mode work?
• JIT approach:
‒ Switch to maintenance mode first time it's needed within the chef-client run
‒ Switch out at end of run
• Need to handle case where node was already placed in maintenance
mode prior to beginning of run (e.g. manually by cloud operator)
• Handlers in /etc/chef/client.rb
‒ pacemaker_start_handler
‒ pacemaker_report_handler
‒ pacemaker_exception_handler
‒ /var/chef/handlers/pacemaker_maintenance_handlers.rb
• libraries/maintenance_mode_helpers.rb
20. 20
Cluster-wide synchronization ‒ the problem
Why is synchronization needed?
Example 1:
• Keystone proposal is applied, with keystone-server role
assigned to cluster.
• All nodes start running chef-client more or less in parallel
• Necessary keystone rpms get installed
• Two or more nodes could reach keystone database resource
block at more or less the same time
• action :create only creates if it doesn't exist
• Potential race where >= 2 nodes test for existence before any node
creates it
• >= 2 nodes attempt to create database at the same time
22. 22
Cluster-wide synchronization ‒ the problem
Example 2:
• Continuation of scenario from example 1
• keystone::server recipe configures keystone.conf etc.
• then invokes crm configure to add keystone service to
cluster.
• Pacemaker starts keystone service ...
• ... but it could start on any node!
• ... even a node which hasn't yet finished installing / configuring
keystone!
24. 24
Cluster-wide synchronization ‒ the problem
Turns out we need two types of synchronization:
1. “Founder goes first”
Ensure one node in cluster (the founder)
enters and completes a critical section of a recipe
(e.g. "create database") before any other nodes can enter it.
2. “Wait for all nodes”
Ensure all nodes reach the same point
("keystone installed, configured, and ready to start anywhere")
before any can proceed further.
25. 25
Cluster-wide synchronization ‒ how to use
Type 1: “founder goes first”
crowbar_pacemaker_sync_mark "wait-keystone_database"
...
# Create the Keystone database (critical section)
...
crowbar_pacemaker_sync_mark "create-keystone_database"
N.B. the cluster founder gets to perform the critical section
before any other node, but every node still performs the
critical section, which needs to be idempotent.
What if we only want one node to perform the critical
section?
26. 26
Cluster-wide synchronization ‒ how to use
execute "keystone-manage db_sync" do
command "keystone-manage db_sync"
user node[:keystone][:user]
group node[:keystone][:group]
action :run
# We only do the sync the 1st time, and only if
# we're not doing HA or if we are the founder of
# the HA cluster (so that it's really only done once).
only_if {
!node[:keystone][:db_synced] &&
(!ha_enabled ||
CrowbarPacemakerHelper.is_cluster_founder?(node))
}
end
27. 27
Cluster-wide synchronization ‒ how to use
Type 2: “wait for all nodes”
# Wait for all nodes to reach this point so we know
# that all nodes will have all the required packages
# installed before we create the pacemaker resources.
crowbar_pacemaker_sync_mark "sync-keystone_before_ha"
29. 29
Cluster-wide synchronization ‒ internals
How does it work?
• Hopefully you don't need to know
‒ It should Just Work™
• Chef node attributes used as synchronization “marks”
• See libraries/synchronization.rb for details
• Value defaults to crowbar-revision from proposal
‒ Assumes cookbook name == barclamp name
31. 31
Patterns for HA-enabled barclamps
HA code in recipes often interleaved with non-HA code:
• Ugly if ha_enabled conditionals
• Synchronization points
• Incompatible with using upstream cookbooks
• but we don't have anything better yet :-/
• Possible solution: split cookbooks into chunks at
synchronization points
‒ but would still require intrusive upstream changes
32. 32
Patterns for HA-enabled barclamps
Interim solution: minimise ugliness!
• Split HA code into separate recipes where possible
if ha_enabled
include_recipe "keystone::ha"
end
• Use helpers
my_admin_host = CrowbarHelper.get_host_for_admin_url(node, ha_enabled)
my_public_host = CrowbarHelper.get_host_for_public_url(node,
node[:keystone][:api][:protocol] == "https", ha_enabled)
• Use custom provider for service resources
if ha_enabled
provider Chef::Provider::CrowbarPacemakerService
end
33. 33
Questions?
• I lurk on the Freenode #openstack-chef IRC
channel, nick aspiers
• I also lurk on the Chef OpenStack google group, but
am not currently doing a good job at monitoring traffic
• Feel free to mail me at <aspiers@suse.com>