This document discusses methods for understanding the evolution of open source cloud systems like OpenStack. It presents the authors' solution of using tracing techniques to analyze OpenStack's data and message flows for logical operations such as creating and deleting VMs. Key findings from tracing OpenStack releases include significant behavioral changes between releases, hundreds of database queries and AMQP messages required for operations, and the involvement of components like Keystone, Glance, Nova, and Neutron. The authors propose using their techniques to inject faults and build a knowledge base to aid future problem diagnosis.
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
1. Dissecting Open Source Cloud Evolution: An OpenStack
Case Study
Salman Baset, Chunqiang Tang, Byung Chul Tak, Long
Wang
IBM T. J. Watson Research Center
June 26th, 2013
2. Open source cloud projects
IaaS
PaaS
SaaS
Broadly two types:
(1) Native (listed here)
(2) Adapters (e.g., Netflix on EC2)
S. Baset, CQ Tang, B. Tak, L. Wang 2
3. Timeline for cloud open source
2006 2007 2008 2009 2010 2011 2012
Amazon EC2 Google App
Engine
2005
2001
3
4. Two characteristics of open source cloud systems
• Distributed multi-component architecture
– Example: OpenStack and Cloud Foundry have more than 10 components for
their IaaS controllers
• Rapid development by a community of developers
S. Baset, CQ Tang, B. Tak, L. Wang 4
5. Rapid development
• Open source cloud projects are being developed and released at a rapid
pace
– OpenStack: releases every six months
– Eucalyptus: every four months
– OpenShift Enterprise: every four months
• Compare it to
– Linux kernel: 2-3 months (3.x – 3.(x+1) )
– Ubuntu distro releases: every six months
• Major cloud providers are consuming OpenStack directly from
development trunk
– Two weeks behind the trunk
S. Baset, CQ Tang, B. Tak, L. Wang 5
6. Why understand evolution?
• Evolution:
– A git commit or a major release
• Research perspective
– How logical operations (e.g., create a VM) change across major versions?
• Developer perspective
– What is the impact of my committed changes?
• Provider perspective
– Continuous deployment and delivery
• How does a provider gain confidence in deploying a new release in production?
• What is the impact of new changes and configuration options on logical operations?
– Message flow, performance evaluation, fault injection etc
S. Baset, CQ Tang, B. Tak, L. Wang 6
7. Methods for understanding evolution
• Static
– Source code
– Documentation
• Dynamic
– Log analysis
• Lab and/or production
– Tracing message flow
• With or without code instrumentation
• Automatic correlation of message flow with logs
• Lab and/or production
– Fault injection
– Performance study
• Lab
S. Baset, CQ Tang, B. Tak, L. Wang 7
8. Our solution
• Without source code modification
– Tracing
– Tracing with log correlation
– Fault injection
• Other solutions
– Google Dapper (built RPC framework leveraging callbacks)
– Twitter Zipkin (attach identifiers to requests)
S. Baset, CQ Tang, B. Tak, L. Wang 8
9. 9
Summary of our solution: Tracing
• This simplified diagram shows one example path for one user request.
• A path is the series of system events such as RECEIVE and SEND across servers
captured using LD_PRELOAD technique.
• Prior art: vPath constructs such causal path of system activities initiated by user
requests.
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Apache webserver
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Application server
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Database server
Request
SEND
RECEIVE
SEND
SEND
SEND
RECEIVE
SEND
10. 10
Summary of our solution: Tracing with queues
• The path breaks if there are queues in the middle.
– Apache web server inserts a message in the queue and returns
– Application server retrieves the message from the queue and performs work
– How do we correlate these messages?
• Augment path information with unique message information
– e.g., transaction ids
• Run only one logical operation in the system if no unique message information
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Apache webserver
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Application server
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Database server
Request
SEND
RECEIVE
SEND
SEND
SEND
RECEIVE
SEND
Queue
11. 11
Summary of our solution: Log Analysis
• Key idea
– Combine the log information and causality (path) discovery technique
Trace low-level system calls to
infer causality and understand how
an application executes
Monitor log files and link log file
entries to observed low-level
system calls
Link
together
Improved
Semantics for
Problem Diagnosis
12. 12
Diagram: Detecting Log Writes
• During normal run,
– Maintain a mapping between fd and file name string
– Maintain a list of known/discovered log files
• On ‘write’ system calls,
– Check parameters and see if it is a ‘write’ on one of the log files.
– If it is, and the data to be written contains alerting keywords such as ‘ERROR’, then this is
a log write due to some errors.
– This ‘write’ event will be annotated appropriately.
Recv Read write SendRequest
Websphere /var/log/was.log
DB2 /var/log/db2/access.log
DB2 /usr/local/db2/fie22xlv.log
DB2 /usr/local/db2/fie23xlv.log
log file name
<Fragment of a Path>
Parameters
fd=5,offset=2048,data=“ERROR: …”
9
14
5
8
fd application
13. 13
Fault Injection for Building up Knowledge Base for Future
Problem Diagnosis
• Injects errors, observe application’s behavior, and build a knowledge base for future problem
diagnosis
– Alters a return value of a system call, e.g., mimic network communication error
– It observes the logging reaction.
– It repeats this for each system call and for each requests.
– It accumulates the observed logging reactions as a knowledge base.
• When an error message is logged in a production system, using the knowledge base to infer
the probability of different root causes
– Construct Bayesian Belief Network for inference
• In the example figure, fault injection changes the return value of ‘Read’ event to -1. This
triggers an error to be logged at the later part of the path.
Recv Read write SendRequest Recv write
Return value: 1024
Return value: -1
Parameter
data=“ERROR: Record missing.”
Newly appeared event
Reaction to our error injection
Altered
14. Brewing complexity: Evolution of OpenStack loc *
Released Nova Cinder Glance Keystone Quantum Swift Total
Austin Oct 2010 17,288 12,979 30,627
Bexar Feb
2011
27,734 3,629 16,014 47,377
Cactus Apr 2011 43,947 4,927 16,665 65,539
Diablo Sep
2011
66,395 9,961 12,451 15,591 91,947
Essex Apr 2012 87,750 15,698 11,555 17,646 149,596
Folsom Sep
2012
103,637 31,241 20,271 13,939 42,118 19,114 230,320
Grizzly Apr 2013 120,968 49,797 21,261 20,071 60,485 23,035 321,081
* CRLF and not python loc S. Baset, CQ Tang, B. Tak, L. Wang 14
Methodology
wc -l `find . | grep -E '*.py' | grep -v test | grep -v 'doc'`
wc -l `find . | grep -E '*.sh' | grep test | grep -v 'doc'`
17. OpenStack tracing
• Understand OpenStack data and message flow for logical operations, e.g.,
– Create a VM
– Delete a VM
– List VMs
– Create a volume
– Add or remove volume to a VM
– Create a floating IP address
– Add or remove floating IP address from a VM
– Create or destroy a virtual network
• Understand
– REST calls
– Data flow
– AMQP flow
– Timing information
17
• Build data consistency tool
• Gather data for generating performance load
• Build a performance model
S. Baset, CQ Tang, B. Tak, L. Wang
18. 18
Key observations from tracing OpenStack (1/2)
• OpenStack is evolving very rapidly. Significant behavior changes from one release to
another.
• Total tables
– Grizzly: 105 tables (160 with nova shadow tables), 53 in Diablo
• Creating a VM (grizzly)
– 139 SELECT queries, 37 INSERT queries, 74 UPDATE queries
– 12 tables are touched for INSERT and UPDATE
• In Diablo (Sep 2011), there were 450 SELECT, 4 INSERT, and 9 UPDATE queries
– 717K read, 458K write
– 655 send() calls to AMQP, 414 recv() calls
• Deleting a VM
– Only single record is deleted from database (rest are archived)
• Request-id
– Instance and request-id are stored in database (but only after updating quota) and before a
request is sent to the scheduler.
• Quota management
– Entries are inserted in database to indicate resource allocation for a VM. Negative or NULL entries
are inserted for deallocation. Each quota entry has expiration time (one day). E.g., core, fixedIP
etc.
• VM state and task state
– networking, block_device_mapping, spawning
• Keystone
– Token verification is optimized in Grizzly using caches (for flavor=keystone) and PKI
18S. Baset, CQ Tang, B. Tak, L. Wang
19. 19
Key observations from tracing OpenStack (2/2)
• Development of a data consistency checking tool
– Orphan iptable rules (not associated with VM transaction) => security holes
– Orphan data in tables due to errors in VM creation etc => audit and clean up
– Orphan virsh data => audit and clean up
S. Baset, CQ Tang, B. Tak, L. Wang 19S. Baset, CQ Tang, B. Tak, L. Wang
20. 20
Methodology
• Run OpenStack in a machine (w/ and w/o timers disabled)
• Diablo, Essex, Folsom, Grizzly
• Ubuntu, RabbitMQ, MySQL
• Use curl to send API request to OpenStack
– flavor=keystone
– Image has three parts
• AMI, ram disk, kernel image
– For keystone, PKI based token verification also used in grizzly
– Each service’s token were created before issuing a create or delete VM call
• Use our technique to capture message interaction, generate flow, run message analytics, and
insert faults (on going)
• curl_createserver.sh
AUTHTOKEN=$1
curl -i http://9.47.240.166:8774/v2/3283d689d02c41248fc82c202e82055a/servers -X POST -H "X-Auth-Project-Id: admin" -
H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token:
${AUTHTOKEN}" -d '{"server": {"name": "test1", "imageRef": "de8882fb-94b3-4105-a212-c0a7fd8ab1e9", "flavorRef": "1",
"max_count": 1, "min_count": 1, "networks": [{"uuid": "48de54f9-2a60-4f28-9740-d6317086c32a"}] }}'
S. Baset, CQ Tang, B. Tak, L. Wang 20S. Baset, CQ Tang, B. Tak, L. Wang
22. 22
Keystone REST flow for creating a server (grizzly)
22
User Keystone nova-api glance-api
Credentials
Token (role)
Get services and
endpoints + token
Services + endpoints
Token + CreateInstance
Verify + token
Token + GetImage
Verify + token
image
CreateInstance Success
Accepted
glance-registry
Token + GetImage
Verify + token
image
S. Baset, CQ Tang, B. Tak, L. Wang
23. 23
Create a VM: overview (1/4)
• Which OpenStack component is issuing SELECT queries?
Diablo Essex Folsom-
nova-
network
Folsom-
quantum
Grizzly-
nova-
network
Girzzly
quantum
Auth. keystone 422 54 358 484 82 243
API server nova-api 4 11 11 9 10 10
Agent on
compute node
nova-
compute
4 5 13 14 0 0
Controller
agent
nova-
conductor
n/a n/a n/a n/a 15 16
Network agent
on compute
nova-
network
13 19 17 n/a 20 n/a
Scheduler nova-
scheduler
1 2 1 1 4 4
Image registry
server
glance-
registry
6 4 8 8 8 8
Network API
server
quantum-
server
n/a n/a n/a 44 n/a 62
23S. Baset, CQ Tang, B. Tak, L. Wang
24. 24
Create a VM: overview (2/4)
• How many HTTP requests with respect to SELECT calls? Red indicates REST calls rcvd.
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
keystone 422 54 358 484 82 243
30 GET 9 GET 17 GET 23 GET 3 GET 6 GET, 2POST
nova-api 4 11 11 9 10 10
1 POST 1 POST 1 POST 1 POST 1 POST 1 POST
nova-compute 4 5 13 14 0 0
nova-conductor n/a n/a n/a n/a 15 16
nova-network 13 19 17 n/a 20 n/a
nova-scheduler 1 2 1 1 4 4
glance-api 0 0 0 0 0 0
2 GET, 5
HEAD
4 HEAD 8 HEAD 8 HEAD 8 HEAD 8 HEAD
glance-registry 6 4 8 8 8 8
7 GET 4 GET 8 GET 8 GET 8 GET 8 GET
quantum-server n/a n/a n/a 44 n/a 62
5 GET, 1 POST 9 GET, 1 POST
24
S. Baset, CQ Tang, B. Tak, L. Wang
25. Why so many SELECT queries in keystone?
• In Diablo, for every keystone GET, 14 SELECT queries are issued, except for first query (16)
• In Essex, for every keystone GET, 6 SELECT queries are issued
• In Folsom-nova-net/quantum, for every keystone GET, 21 SELECT queries are issued, except
for first query (22)
• In Grizzly-nova-net, 27 SELECT queries for each request except for first (1).
– Keystone tokens are also cached. So subsequent queries do not result into full keystone token authentication
• If PKI token verification is used, the number of SELECT queries sent by keystone drop to 7
from 82.
25
keystone 422 54 358 484 82 243
30 GET 9 GET 17 GET 23 GET 3 GET 6 GET, 2POST
S. Baset, CQ Tang, B. Tak, L. Wang
26. 26
Create a VM: overview (3/4)
• What if there is no keystone?
Keystone enabled
Keystone disabled
S. Baset, CQ Tang, B. Tak, L. Wang 26
Diablo Essex Folsom-
nova-
network
Folsom-
quantum
Grizzly-
nova-
network
Grizzly
quantum
SELECT 28 41 51 76 57 100
INSERT 4 4 23 24 37 38
UPDATE 6 10 60 58 74 70
Diablo Essex Folsom-
nova-
network
Folsom-
quantum
Grizzly-
nova-
network
Grizzly
quantum
SELECT 450 95 409 560 139 343
INSERT 4 4 23 24 37 40
UPDATE 6 10 60 58 74 70
S. Baset, CQ Tang, B. Tak, L. Wang
28. 28
Grizzly
nova-net
SELEC
T
2 block_device_mapping
6 compute_node_stats
6 fixed_ips
1 floating_ips
8 images
4 instance_actions
2 instance_actions_events
1 instance_info_caches
4 networks
2 provider_fw_rules
5 quotas
4 quota_usages
2 reservations
7 role
1 security_group_rules
3 security_groups
4 virtual_interfaces
S. Baset, CQ Tang, B. Tak, L. Wang 28
Grizzly
nova-net
INSERT 12 compute_node_stats
1 instance_actions
2 instance_actions_events
1 instance_id_mappings
1 instance_info_caches
1 instances
13 instance_system_metadata
4 reservations
1
security_group_instance_associatio
n
1 virtual_interfaces
Grizzly
nova-net
UPDATE 6 compute_nodes
44 compute_node_stats
3 fixed_ips
2
instance_actions_events
1 instance_info_caches
8 instances
8 quota_usages
2 reservations
Tables touched for create VM
in grizzly-nova-net
S. Baset, CQ Tang, B. Tak, L. Wang
29. 29
Dataflow flow for creating a server (grizzly) (1/2)
29
nova-api nova-scheduler nova-conductor nova-compute
Create server Check quota
INSERT INTO reservations (instances, expires, usageid1)
INSERT INTO reservations (ram, expires, usageid2)
INSERT INTO reservations (core, expires, usageid3)
UPDATE quota_usages (usageid1)
UPDATE quota_usages (usageid2)
UPDATE quota_usages (usageid3)
Check if images exist
INSERT INTO instances (‘instance_uuid’)
INSERT INTO security_group_instance_association (‘instance_uid’)
INSERT INTO instance_system_metadata (‘image_kernel_id, instance_uuid’)
INSERT INTO instance_system_metadata (‘instance_type_memory_mb’)
INSERT INTO instance_system_metadata (‘instance_type_swap’)
INSERT INTO instance_system_metadata (‘instance_type_vcpu_weight’)
INSERT INTO instance_system_metadata (‘instance_type_root_gb’)
INSERT INTO instance_system_metadata (‘instance_type_id’)
INSERT INTO instance_system_metadata (‘image_ramdisk_id’)
INSERT INTO instance_system_metadata (‘instance_type_name’)
INSERT INTO instance_system_metadata (‘instance_type_ephemeral_gb’)
INSERT INTO instance_system_metadata (‘instance_type_rxtx_factor’)
INSERT INTO instance_system_metadata (‘instance_type_flavorid’)
INSERT INTO instance_system_metadata (‘instance_type_flavorid’)
INSERT INTO instance_system_metadata (‘image_base_image_ref’)
INSERT INTO instance_info_caches (‘instance_uuid)
Create reservations. No request id. Default: expires after
a day if not updated.
Update quotas.
What if nova-api dies here? Then quota updates
can potentially be permanent until expired or cleanup.
Create instance in the database.
30. 30
Dataflow flow for creating a server (grizzly) (2/2)
30
nova-api nova-scheduler nova-compute nova-conductor
INSERT into instance_id_mappings(‘instance_uuid’)
Update time in quota_usages table
INSERT INTO instance_actions (instance_uuid, request_id)
Send to scheduler (request_id)
INSERT into instance_action_events(scheduling)
nova-network
INSERT into instance_actions_events(compute_run)
Libvirt – create instance
UPDATE instances (task_state = NULL)
GET images from glance
UPDATE instances (host, node)
UPDATE compute_node_stats *
INSERT INTO compute_node_stats
UPDATE instances (task_state=networking)
This request is key. It associates instance id
with a request id. But occurs after quota and
reservations has been updated. BAD!!!
S. Baset, CQ Tang, B. Tak, L. Wang
31. 31
How many SQL queries for create VM before a request
is sent to:
S. Baset, CQ Tang, B. Tak, L. Wang 31
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
SELECT 202 10 27 289 98 138
INSERT 0 0 3 10 21 21
UPDATE 0 0 3 9 7 7
S. Baset, CQ Tang, B. Tak, L. Wang
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
SELECT 371 52 292 290 100 140
INSERT 3 3 10 10 22 22
UPDATE 1 2 10 10 8 8
scheduler
compute
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
SELECT 450 95 409 560 139 343
INSERT 4 4 23 24 37 40
UPDATE 6 10 60 58 74 70
32. 32
Create VM total message bytes – read() or recv()
S. Baset, CQ Tang, B. Tak, L. Wang 32
Diablo Essex Folsom
nova-network
Folsom
quantum
Grizzly
nova-network
keystone 154841 23090 198493 269920 41888
nova-api 65596 81836 75507 21435 22766
nova-compute 155233
(113701)
157660
(105460)
202163
(163107)
206003
(167383)
106396
(110721)
nova-conductor n/a n/a n/a n/a 371614
nova-network 98101 77184 62509 n/a 103100
nova-scheduler 3380 38477 16465 19688 29674
glance-registry 36764 16632 45798 46104 30494
glance-api 17440 6326 32386 32716 11248
quantum-server n/a n/a n/a 46533 n/a
quantum-dhcp n/a n/a n/a 3722 n/a
Total 531355 401205 582185 650,615 717,180
S. Baset, CQ Tang, B. Tak, L. WangExcludes any image transfer
33. 33
Create VM total message bytes – write() or send()
S. Baset, CQ Tang, B. Tak, L. Wang 33
Diablo Essex Folsom
nova-network
Folsom
quantum
Grizzly
nova-network
keystone 115606 15129 128957 174884 25364
nova-api 50704 70995 25449 20265 22693
nova-compute 99899 109136 127436 126143
(122363)
74864
(68352)
nova-conductor n/a n/a n/a n/a 222228
nova-network 74106 63446 46123 n/a 57321
nova-scheduler 2964 30182 17662 21993 26997
glance-registry 23095 11006 18210 18196 20329
glance-api 8841 5038 10226 10220 8705
quantum-server n/a n/a n/a 25986 n/a
quantum-dhcp n/a n/a n/a 84 n/a
Total 375,447 305,156 374,499 403,507 458,501
S. Baset, CQ Tang, B. Tak, L. Wang
34. 34
Create a VM: Message exchange with RabbitMQ – send()
Diablo Essex Folsom
nova-network
Folsom-
quantum
Grizzly
nova-network
nova-api 23 (3392) 35 (4769) 23 (8600) 11 (5254) 11 (4062)
nova-compute 18 (1316) 18 (1430) 18 (3782) 1 (21) 306 (67874)
nova-network 31 (1816) 45 (1018) 32 (2159) n/a 14 (1786)
nova-
scheduler
23 (2392) 12 (2976) 12 (7388) 12 (9737) 7 (11567)
nova-
conductor
n/a n/a n/a n/a 317 (82717)
quantum-
server
n/a n/a n/a 36 (4498) n/a
quantum-dhcp n/a n/a n/a 4 (84) n/a
S. Baset, CQ Tang, B. Tak, L. Wang 34S. Baset, CQ Tang, B. Tak, L. Wang
35. 35
Create a VM: Message exchange with RabbitMQ – recv()
Diablo Essex Folsom
nova-network
Folsom-
quantum
Grizzly
nova-network
nova-api 16 (833) 25 (1609) 16 (833) 7 (328) 7 (328)
nova-compute 14 (3442) 14 (2369) 14 (8752) 1 (9479) 230 (94463)
nova-network 18 (1808) 26 (3045) 19 (7298) n/a 8 (2699)
nova-
scheduler
8 (2479) 8 (2918) 8 (5307) 8 (5345) 4 (3861)
nova-
conductor
n/a n/a n/a n/a 172 (58721)
quantum-
server
n/a n/a n/a 24 (396) n/a
quantum-dhcp n/a n/a n/a 4 (3726) n/a
S. Baset, CQ Tang, B. Tak, L. Wang 35S. Baset, CQ Tang, B. Tak, L. Wang
36. S. Baset, CQ Tang, B. Tak, L. Wang 36
2176 comp
172 cond
1667 gapi
139 greg
3 keys
5429 napi
12 netw
4 sche
308 comp
317 cond
17 gapi
9 greg
3 keys
19 napi
19 netw
7 sche
Create a VM: send() and recv() grizzly-nova net
send() recv()
Single byte recv
in webob library
37. Conclusions
• Complexity is brewing under OpenStack. Beware!
• Build distributed applications with tracing in mind
• Flow diff
– Through an interactive page
• Ongoing and future work
– Fault injection and log correlation
– Leverage tool for other projects, e.g., CloudFoundry
S. Baset, CQ Tang, B. Tak, L. Wang 37
Notes de l'éditeur
Talk about when started
How many open source cloud projects?
Hadoop is not listed here. Neither Chef, Puppet, Zenoss, Ganglia