Axa Assurance Maroc - Insurer Innovation Award 2024
Oreilly solinea-managing-openstack
1. Accelerating the adoption of Cloud Computing
Beyond Installation:
Managing Your OpenStack Cloud
May 6th, 2014
2. ! Ken Pepple is the co-founder and Chief
Technology Officer of Solinea
! Prior to founding Solinea, he led the introduction
of Internap's OpenStack-based public cloud
services while serving as their Director of Cloud
Development
! Code contributor since Bexar release of
OpenStack
! Author of O'Reilly Media's "Deploying OpenStack"
and several other books
2
Speakers
3. Introduction
! Installing OpenStack gets all the attention …
! … but distributions like Red Hat OSP and
Cloudscaling are attacking this problem
! They will beat it. Then what ?
! The reality is that OpenStack management is
what we should be focusing on.
! Installation is 2 – 3 weeks … management is
forever
3
4. OpenStack Architecture
OpenStack Object Store
OpenStack
Image Service OpenStack Compute
OpenStack
Dashboard
OpenStack
Identity
Service
OpenStack Compute
API /
Admin API
keystone
(service & admin APIs)
nova-api
(OS, EC2, Metadata, Admin)
nova-consoleauth
nova-cert/
objectstore
nova-console
nova-*proxy
VNC/Spice
OpenStack
Object API
http://www.solinea.com
Queue
nova-compute
nova-scheduler
nova
database
OpenStack
Compute API
OpenStack
Image API
Horizon
OpenStack
Image API
identity
backend
swift-proxy
objectcontaineraccount
object
store
account
DB
container
DB
OpenStack
Object API
HTTP(S)
OpenStack
Object API
OpenStack
Identity API
OpenStack
Identity
API
OpenStack
Identity
API
OpenStack
Image API
OpenStack
Identity
API
OpenStack
Image
API
catalog
backend
token backend
OpenStack
Identity
API
hypervisor
libvirt, XenAPI, etc.
HTTP(S)
Amazon
Web Services
EC2 API
Internet / Enterprise Network
OpenStack
Network Service
glance-api
glance
database
OpenStack
Block Storage
OpenStack
Block Storage API
cinder-api
cinder-volume
neutron-server
neutron
plugin(s)
OpenStack
Identity
API
cinder-scheduler
cinder
database
OpenStack
Network API
network
provider
OpenStack
Block
Storage
API
OpenStack
Network API
policy
backend
Queue
OpenStack
Network API
neutron
database
neutron
agent(s)
nova-conductor
memcached
OpenStack
Identity
API
⁃ OpenStack Command Line Tools (nova-client, swift-client, etc.)
⁃ Cloud Management Tools (Rightscale, Enstratius, etc.)
⁃ GUI tools (Cyberduck, iPhone client, etc.)
volume provider
cinder-backup
OpenStack Object API
OpenStack
Identity
API
Queue
OpenStack
Block Storage API
OpenStack
Orchestration
heat-api
heat-engine
heat
database
Queue
cloudwatch-api
OpenStack
Orchestration API
OpenStack
Identity
API
OpenStack Compute
API
OpenStack Bock Storage
API
OpenStack Network API
glance-api
OpenStack
Database
trove-api
trove-taskmgr
trove
database
trove-conductor
OpenStack
Identity
API
OpenStack
Database API
OpenStack Block Storage API
OpenStack Orchestration API
OpenStack Compute API
Agent
Queue
4
* Ceilometer omitted for clarity
6. OpenStack Management Basics
! Development and test cluster
– Smaller, but representative
– Same set and version of services
– Reproduce problems, test fixes and practice upgrades
! Configuration management system
– Chef, Puppet, Ansible, SaltStack, etc.
– Your OpenStack distribution already uses one
– Pick one and stick with it – everything falls under it
! Skilled and trained staff
– Experienced Linux admins with virtualization skills
– Network architects that understand cloud
– Trained for OpenStack
6
8. Troubleshooting Tools
! Tools used to investigate or fix problems within
your stack
! Mostly Linux tools but some are specific to
OpenStack
! These need to span virtualization, networking
and normal system administration
8
9. Troubleshooting Hypervisor
! Vary by hypervisor, each one has it’s own tooling
! Map VM to hypervisor by OpenStack CLI with
nova show!
! Investigate hypervisor through virsh tools
! Also can access backing store for VM through
hypervisor mount point or Cinder volume
9
10. VM Troubleshooting
# nova list!
+---------------------+-------+---------+------------+-------------+-------------------------------------+!
| ID | Name | Status | Task State | Power State | Networks |!
+---------------------+-------+---------+------------+-------------+-------------------------------------+!
| f94b097d-b030-473b- | ken | ACTIVE | - | Running | rdonet=192.168.90.11 |!
+---------------------+-------+---------+------------+-------------+-------------------------------------+!
# nova show f94b097d-b030-473b-86a3-d501091c650b!
+--------------------------------------+------------------------------------------------------------+!
| Property | Value |!
+--------------------------------------+------------------------------------------------------------+!
| OS-EXT-AZ:availability_zone | nova |!
| OS-EXT-SRV-ATTR:host | localhost.localdomain |!
| OS-EXT-SRV-ATTR:hypervisor_hostname | localhost.localdomain |!
| OS-EXT-SRV-ATTR:instance_name | instance-0000000e |!
| OS-EXT-STS:power_state | 1 |!
| OS-EXT-STS:task_state | - |!
| OS-EXT-STS:vm_state | active |!
| OS-SRV-USG:launched_at | 2014-05-06T06:13:01.000000 |!
| created | 2014-05-06T06:11:55Z |!
| flavor | m1.small (2) |!
| hostId | 7e31bda83a3586907464e8e68f83a035bf9fa500d9579b2b807fa9f0 |!
| id | f94b097d-b030-473b-86a3-d501091c650b |!
| image | cirros-0.3.2-x86_64 (f66d54e8-f8bd-4220-930f-86b6b44dfe4d) |!
| rdonet network | 192.168.90.11 |!
| security_groups | default |!
| status | ACTIVE |!
+--------------------------------------+------------------------------------------------------------+!
# vish list!
Id Name State!
----------------------------------------------------!
1 instance-0000000e running!
10
11. Troubleshooting Backing Store (Ephemeral)
# cd /var/lib/nova/!
# ls!
buckets CA images instances keys networks tmp!
# cd instances/!
# ll!
total 16!
drwxr-xr-x. 2 nova nova 4096 May 6 09:03 13e86b72-7e14-43f5-ab2f-
e7abf117213f!
drwxr-xr-x. 2 nova nova 4096 May 2 11:25 _base!
-rw-r--r--. 1 nova nova 45 May 5 23:18 compute_nodes!
drwxr-xr-x. 2 nova nova 4096 Apr 30 19:28 locks!
# cd 13e86b72-7e14-43f5-ab2f-e7abf117213f/!
# ll!
total 208!
-rw-rw----. 1 qemu qemu 0 May 6 09:04 console.log!
-rw-r--r--. 1 qemu qemu 262656 May 6 09:03 disk!
-rw-r--r--. 1 nova nova 79 May 6 09:03 disk.info!
-rw-r--r--. 1 nova nova 1529 May 6 09:04 libvirt.xml!
# file disk!
disk: Qemu Image, Format: Qcow , Version: 2!
11
The ‘disk’ file is
our qcow image.
The XML file is
the KVM
template.
12. Troubleshooting Network
! Combination of Linux, OpenvSwitch and OpenStack
tools
! OpenStack tools will show logical configuration of
Neutron’s ports, routers and subnets
– neutron port-list, net-list and router-
list!
! OpenvSwitch will map internal and external bridges
– ovs-vsctl and ovs-dpctl!
! Linux tools will show you inside VLAN and Linux
namespaces
– ip netns, iptables and tcpdump!
12
14. Troubleshooting OVS Bridges
# ovs-vsctl show!
06667946-811b-4c7b-97a5-eafc8386e9ff!
Bridge br-int!
Port "qvo246622d1-02"!
tag: 2!
Interface "qvo246622d1-02"!
Port "tap3dfc8b70-ee"!
tag: 1!
Interface "tap3dfc8b70-ee"!
Port "tapecef7610-4f"!
tag: 2!
Interface "tapecef7610-4f"!
Port "tapc4e2b047-4a"!
tag: 2!
Interface "tapc4e2b047-4a"!
Port br-int!
Interface br-int!
type: internal!
Bridge br-ex!
Port br-ex!
Interface br-ex!
type: internal!
Port "eth1"!
Interface "eth1"!
Port "tapbbe18331-0c"!
Interface "tapbbe18331-0c"!
ovs_version: "1.11.0"!
14
Neutron’s integration bridge
connecting VMs
Neutron’s external bridge
Physical NIC for internet
access
15. Monitoring
! Metering is not monitoring
– Ceilometer isn’t a monitoring solution
! Horizon doesn’t save history
! Monitor for FCAPS: fault, configuration,
accounting, performance and security
! Needs to be instrumented at multiple levels
– Hardware/Operating System, OpenStack, VM
– Although VM monitoring may be left to the user
! Needs to be used across all elements and
processes
15
16. Operating System Monitoring
! Required set of information as any other set of
systems
– CPU, memory, availability, etc.
! Process level information
– RabbitMQ, database, OpenStack processes, etc.
! Should rely on host sending information to
monitoring server (not ping model)
! Ideally has APIs and strong discovery to aid
automation
16
17. ! Installed as part of
many distributions
! Open source
! Easy installation and
usage
! API is an add-on
module
Nagios
17
18. Nagios OpenStack Plugin
! Add service checks for some OpenStack
services
– Glance
– Keystone
– Nova
– Swift API and dispersion
! Available in most Linux distributions
– # sudo apt-get install nagios-plugins-openstack!
! More information and checks available at
http://openstack.prov12n.com/monitoring-
openstack-nagios-3/
18
19. ! Open source
monitoring tool used
at several large
service provider
clouds
! Strong API and
discovery modes
! Templates can be
applied to host groups
for monitoring
Zabbix Console
19
20. Zabbix Templates
! Templates created for each type of server
– Compute nodes, controllers, Swift object servers, etc.
! Each template checks processes running and
configuration management running
– Should issue commands against processes not rely
on process table to catch hung processes
! All nodes also get default OS template
! Alerting set for pagerduty
20
21. Log Management
! More than just for error viewing
! Primary source of OpenStack data
! Useful for
– Finding OpenStack bugs
– Understanding event timings (spin new VM)
– Visualizing cluster level statistics (VMs running)
– Creating dashboards
! Can be challenging to store, query and interpret
data
– Clusters can generate GBs per day
– Use dedicated tools and data stores
– May be required for legal / audit reasons
21
22. ! Commercial log
management solution
! Visualization, ad hoc
queries, post
processing and add-
ons
! Easy to setup
dashboards
! Supported with
relatively easy
installer
Splunk
22
23. ! Open source
alternative to Splunk
! Requires more
complicated setup to
parse logs correctly
! Provides ad hoc
queries as well as
dashboards
! Active community
Logstash, Kibana and ElasticSearch
23
24. Interesting Uses for Log Data
! VMs
– CPUs/Instances by hypervisor (scheduler efficiency)
– Total vCPUs/CPUs in cluster available versus used
– Spawn success and failures
– Spawn time
– Top Users of VMs/vCPUs
! Authentication
– Tokens generated versus invalidated
– Failed authentications
! Errors
– All error messages / stack traces create alert
! Logs that have stopped (zombie processes)
24
25. “Canary” Scripts
! Highest level check for cloud infrastructure: “Can
we spin a new VM ?”
– Custom written script that starts VM, attaches block
storage, assign IP address, pings outside world then
terminates
– Logs to all actions with timings into log management
solution
! Run every 5 to 15 minutes
! Also can be run interactively
! This should be written for your own site
25
26. Specialized Tools
! Many sites will want to be able to create their
own custom images
– CI/CD “golden images”
! Several commercial and open source
alternatives
– CohesiveFT Server3
– Elasticbox (https://www.elasticbox.com/)
– Packer (http://www.packer.io/)
! All provide ability to create images with specified
software pre-installed via command line
26
28. Rolling (“live”) Upgrades
! Ability to upgrade a running cluster to new
release
! Upgrade controller(s) first then individual
compute nodes
! Requires several pre-conditions
– Neutron upgraded first
– Nova-conductor being used to isolate DB schemas
– Set icehouse compatibility mode
/etc/nova/nova.conf
# Set a version cap for messages sent to compute services. If
# you plan to do a live upgrade from havana to icehouse, you
# should set this option to "icehouse-compat" before beginning
# the live upgrade procedure. (string value)
compute=icehouse-compat
28
29. ! API calls to /info
will return information
about the cluster
! Users now able to
take advantage of the
unique features
available in each
cluster
! Turned on by default
but can be disabled
# swift capabilities!
Core: swift!
Options:!
account_listing_limit: 10000!
container_listing_limit: 10000!
max_account_name_length: 256!
max_container_name_length: 256!
max_file_size: 5368709122!
max_meta_count: 90!
max_meta_name_length: 128!
max_meta_value_length: 256!
max_object_name_length: 1024!
strict_cors_mode: True!
version: 1.13.1!
Additional middleware:
keystoneauth!
Additional middleware: staticweb!
Additional middleware: tempurl!
Options:!
methods: ['GET', 'HEAD', 'PUT']!
29
Swift Discoverability