SlideShare une entreprise Scribd logo
1  sur  37
Dissecting Open Source Cloud Evolution: An OpenStack
Case Study
Salman Baset, Chunqiang Tang, Byung Chul Tak, Long
Wang
IBM T. J. Watson Research Center
June 26th, 2013
Open source cloud projects
IaaS
PaaS
SaaS
Broadly two types:
(1) Native (listed here)
(2) Adapters (e.g., Netflix on EC2)
S. Baset, CQ Tang, B. Tak, L. Wang 2
Timeline for cloud open source
2006 2007 2008 2009 2010 2011 2012
Amazon EC2 Google App
Engine
2005
2001
3
Two characteristics of open source cloud systems
• Distributed multi-component architecture
– Example: OpenStack and Cloud Foundry have more than 10 components for
their IaaS controllers
• Rapid development by a community of developers
S. Baset, CQ Tang, B. Tak, L. Wang 4
Rapid development
• Open source cloud projects are being developed and released at a rapid
pace
– OpenStack: releases every six months
– Eucalyptus: every four months
– OpenShift Enterprise: every four months
• Compare it to
– Linux kernel: 2-3 months (3.x – 3.(x+1) )
– Ubuntu distro releases: every six months
• Major cloud providers are consuming OpenStack directly from
development trunk
– Two weeks behind the trunk
S. Baset, CQ Tang, B. Tak, L. Wang 5
Why understand evolution?
• Evolution:
– A git commit or a major release
• Research perspective
– How logical operations (e.g., create a VM) change across major versions?
• Developer perspective
– What is the impact of my committed changes?
• Provider perspective
– Continuous deployment and delivery
• How does a provider gain confidence in deploying a new release in production?
• What is the impact of new changes and configuration options on logical operations?
– Message flow, performance evaluation, fault injection etc
S. Baset, CQ Tang, B. Tak, L. Wang 6
Methods for understanding evolution
• Static
– Source code
– Documentation
• Dynamic
– Log analysis
• Lab and/or production
– Tracing message flow
• With or without code instrumentation
• Automatic correlation of message flow with logs
• Lab and/or production
– Fault injection
– Performance study
• Lab
S. Baset, CQ Tang, B. Tak, L. Wang 7
Our solution
• Without source code modification
– Tracing
– Tracing with log correlation
– Fault injection
• Other solutions
– Google Dapper (built RPC framework leveraging callbacks)
– Twitter Zipkin (attach identifiers to requests)
S. Baset, CQ Tang, B. Tak, L. Wang 8
9
Summary of our solution: Tracing
• This simplified diagram shows one example path for one user request.
• A path is the series of system events such as RECEIVE and SEND across servers
captured using LD_PRELOAD technique.
• Prior art: vPath constructs such causal path of system activities initiated by user
requests.
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Apache webserver
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Application server
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Database server
Request
SEND
RECEIVE
SEND
SEND
SEND
RECEIVE
SEND
10
Summary of our solution: Tracing with queues
• The path breaks if there are queues in the middle.
– Apache web server inserts a message in the queue and returns
– Application server retrieves the message from the queue and performs work
– How do we correlate these messages?
• Augment path information with unique message information
– e.g., transaction ids
• Run only one logical operation in the system if no unique message information
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Apache webserver
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Application server
thread
RECEIVE
Monitoring Agent
events caught
application
kernel
Ex) Database server
Request
SEND
RECEIVE
SEND
SEND
SEND
RECEIVE
SEND
Queue
11
Summary of our solution: Log Analysis
• Key idea
– Combine the log information and causality (path) discovery technique
Trace low-level system calls to
infer causality and understand how
an application executes
Monitor log files and link log file
entries to observed low-level
system calls
Link
together
Improved
Semantics for
Problem Diagnosis
12
Diagram: Detecting Log Writes
• During normal run,
– Maintain a mapping between fd and file name string
– Maintain a list of known/discovered log files
• On ‘write’ system calls,
– Check parameters and see if it is a ‘write’ on one of the log files.
– If it is, and the data to be written contains alerting keywords such as ‘ERROR’, then this is
a log write due to some errors.
– This ‘write’ event will be annotated appropriately.
Recv Read write SendRequest
Websphere /var/log/was.log
DB2 /var/log/db2/access.log
DB2 /usr/local/db2/fie22xlv.log
DB2 /usr/local/db2/fie23xlv.log
log file name
<Fragment of a Path>
Parameters
fd=5,offset=2048,data=“ERROR: …”
9
14
5
8
fd application
13
Fault Injection for Building up Knowledge Base for Future
Problem Diagnosis
• Injects errors, observe application’s behavior, and build a knowledge base for future problem
diagnosis
– Alters a return value of a system call, e.g., mimic network communication error
– It observes the logging reaction.
– It repeats this for each system call and for each requests.
– It accumulates the observed logging reactions as a knowledge base.
• When an error message is logged in a production system, using the knowledge base to infer
the probability of different root causes
– Construct Bayesian Belief Network for inference
• In the example figure, fault injection changes the return value of ‘Read’ event to -1. This
triggers an error to be logged at the later part of the path.
Recv Read write SendRequest Recv write
Return value: 1024
Return value: -1
Parameter
data=“ERROR: Record missing.”
Newly appeared event
Reaction to our error injection
Altered
Brewing complexity: Evolution of OpenStack loc *
Released Nova Cinder Glance Keystone Quantum Swift Total
Austin Oct 2010 17,288 12,979 30,627
Bexar Feb
2011
27,734 3,629 16,014 47,377
Cactus Apr 2011 43,947 4,927 16,665 65,539
Diablo Sep
2011
66,395 9,961 12,451 15,591 91,947
Essex Apr 2012 87,750 15,698 11,555 17,646 149,596
Folsom Sep
2012
103,637 31,241 20,271 13,939 42,118 19,114 230,320
Grizzly Apr 2013 120,968 49,797 21,261 20,071 60,485 23,035 321,081
* CRLF and not python loc S. Baset, CQ Tang, B. Tak, L. Wang 14
Methodology
wc -l `find . | grep -E '*.py' | grep -v test | grep -v 'doc'`
wc -l `find . | grep -E '*.sh' | grep test | grep -v 'doc'`
nova
database
nova-api
nova-scheduler
nova-compute
dashboard
(horizon)
keystone
glance-api
glance-registry
glance
database
glance API (REST)
AMQPdatabase keystone
OpenStack logical architecture (grizzly+net+cinder)
15
keystone
database
REST
REST
AMQ
P
nova
nova-conductor
cinder-api
cinder
db
AMQP
cinder
cinder-volume
cinder-scheduler
nova-network
nova-cert
nova-cells
Compute nodes
Volume nodes
S. Baset, CQ Tang, B. Tak, L. Wang
IMAGE REPO
BLOCK STORAGE
AUTHENTICATION COMPUTE CONTROLLER
nova
database
nova-api
nova-scheduler
nova-compute
dashboard
(horizon)
keystone
glance-api
glance-registry
glance
database
glance API (REST)
AMQPdatabase keystone
OpenStack logical architecture (grizzly+quantum+cinder)
16
keystone
database
REST
REST
AMQ
P
nova
nova-conductor
cinder-api
cinder
db
AMQP
cinder
cinder-volume
cinder-scheduler
nova-cert
nova-cells
quantum-server
quantum
db
AMQP
quantum
quantum-dhcp
quantum-plugin
quantum-
metadata agent
Compute nodes
Volume nodes
quantum-l3
agent
quantum-l3
agent
IMAGE REPO
BLOCK STORAGE
AUTHENTICATION COMPUTE CONTROLLER
NETWORK CONTROLLER
OpenStack tracing
• Understand OpenStack data and message flow for logical operations, e.g.,
– Create a VM
– Delete a VM
– List VMs
– Create a volume
– Add or remove volume to a VM
– Create a floating IP address
– Add or remove floating IP address from a VM
– Create or destroy a virtual network
• Understand
– REST calls
– Data flow
– AMQP flow
– Timing information
17
• Build data consistency tool
• Gather data for generating performance load
• Build a performance model
S. Baset, CQ Tang, B. Tak, L. Wang
18
Key observations from tracing OpenStack (1/2)
• OpenStack is evolving very rapidly. Significant behavior changes from one release to
another.
• Total tables
– Grizzly: 105 tables (160 with nova shadow tables), 53 in Diablo
• Creating a VM (grizzly)
– 139 SELECT queries, 37 INSERT queries, 74 UPDATE queries
– 12 tables are touched for INSERT and UPDATE
• In Diablo (Sep 2011), there were 450 SELECT, 4 INSERT, and 9 UPDATE queries
– 717K read, 458K write
– 655 send() calls to AMQP, 414 recv() calls
• Deleting a VM
– Only single record is deleted from database (rest are archived)
• Request-id
– Instance and request-id are stored in database (but only after updating quota) and before a
request is sent to the scheduler.
• Quota management
– Entries are inserted in database to indicate resource allocation for a VM. Negative or NULL entries
are inserted for deallocation. Each quota entry has expiration time (one day). E.g., core, fixedIP
etc.
• VM state and task state
– networking, block_device_mapping, spawning
• Keystone
– Token verification is optimized in Grizzly using caches (for flavor=keystone) and PKI
18S. Baset, CQ Tang, B. Tak, L. Wang
19
Key observations from tracing OpenStack (2/2)
• Development of a data consistency checking tool
– Orphan iptable rules (not associated with VM transaction) => security holes
– Orphan data in tables due to errors in VM creation etc => audit and clean up
– Orphan virsh data => audit and clean up
S. Baset, CQ Tang, B. Tak, L. Wang 19S. Baset, CQ Tang, B. Tak, L. Wang
20
Methodology
• Run OpenStack in a machine (w/ and w/o timers disabled)
• Diablo, Essex, Folsom, Grizzly
• Ubuntu, RabbitMQ, MySQL
• Use curl to send API request to OpenStack
– flavor=keystone
– Image has three parts
• AMI, ram disk, kernel image
– For keystone, PKI based token verification also used in grizzly
– Each service’s token were created before issuing a create or delete VM call
• Use our technique to capture message interaction, generate flow, run message analytics, and
insert faults (on going)
• curl_createserver.sh
AUTHTOKEN=$1
curl -i http://9.47.240.166:8774/v2/3283d689d02c41248fc82c202e82055a/servers -X POST -H "X-Auth-Project-Id: admin" -
H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token:
${AUTHTOKEN}" -d '{"server": {"name": "test1", "imageRef": "de8882fb-94b3-4105-a212-c0a7fd8ab1e9", "flavorRef": "1",
"max_count": 1, "min_count": 1, "networks": [{"uuid": "48de54f9-2a60-4f28-9740-d6317086c32a"}] }}'
S. Baset, CQ Tang, B. Tak, L. Wang 20S. Baset, CQ Tang, B. Tak, L. Wang
21
SQL queries in create, delete, list VMs and tables touched
How to read: Tables touched (SQL queries) – [no of tables with INSERT or UPDATE]
Diablo
(Sep 2011)
Essex
(Apr 2012)
Folsom (Sep 12
nova-network
Folsom
quantum
Grizzly (April 12)
nova-network
Grizzly
quantum
SELECT (create) 16 (450) 17 (95) 21 (409) 26 (560) 20 (139) 37 (343)
SELECT (delete) 8 (37) 10 (36) 17 (63) 23 (241) 13 (36) 31 (192)
SELECT (list) 5 (31) 4 (12) 6 (24) 7 (25) 1 (1) 1 (1)
INSERT (create) 4 (4) 4 (4) 8 (23) 9 (24) 10 (37) 13 (40)
INSERT (delete) 0 (0) 0 (0) 1 (3) 1 (3) 3 (6) 4 (6)
INSERT (list) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
UPDATE (create) 2 (9) - 5 3 (12) - 5 7 (60) - 11 7 (59) - 13 8 (74) – 13 8 (70) - 16
UPDATE (delete) 4 (6) - 4 6 (10) - 6 8 (22) - 9 8 (25) - 9 10 (31) - 11 10 (26) - 12
UPDATE (list) 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0
DELETE (create) 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0
DELETE (delete) 1 (1) 1 (1) 1 (1) 1 (1) 1 (1) 1 (1)
Tables 53
4 (glance)
9 (keys)
39 (nova
63
4 (glance)
10 (keystone)
49 (nova)
67
5 (glance)
10 (keystone)
52 (nova)
67 (net)/83 q
16 (quantum)
+ folsom
6 (glance)
19 (keystone)
111 (nova)
55 shadow nova tb
136 (net)/160q
24 (quantum) +
grizzly
S. Baset, CQ Tang, B. Tak, L. Wang
21
22
Keystone REST flow for creating a server (grizzly)
22
User Keystone nova-api glance-api
Credentials
Token (role)
Get services and
endpoints + token
Services + endpoints
Token + CreateInstance
Verify + token
Token + GetImage
Verify + token
image
CreateInstance Success
Accepted
glance-registry
Token + GetImage
Verify + token
image
S. Baset, CQ Tang, B. Tak, L. Wang
23
Create a VM: overview (1/4)
• Which OpenStack component is issuing SELECT queries?
Diablo Essex Folsom-
nova-
network
Folsom-
quantum
Grizzly-
nova-
network
Girzzly
quantum
Auth. keystone 422 54 358 484 82 243
API server nova-api 4 11 11 9 10 10
Agent on
compute node
nova-
compute
4 5 13 14 0 0
Controller
agent
nova-
conductor
n/a n/a n/a n/a 15 16
Network agent
on compute
nova-
network
13 19 17 n/a 20 n/a
Scheduler nova-
scheduler
1 2 1 1 4 4
Image registry
server
glance-
registry
6 4 8 8 8 8
Network API
server
quantum-
server
n/a n/a n/a 44 n/a 62
23S. Baset, CQ Tang, B. Tak, L. Wang
24
Create a VM: overview (2/4)
• How many HTTP requests with respect to SELECT calls? Red indicates REST calls rcvd.
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
keystone 422 54 358 484 82 243
30 GET 9 GET 17 GET 23 GET 3 GET 6 GET, 2POST
nova-api 4 11 11 9 10 10
1 POST 1 POST 1 POST 1 POST 1 POST 1 POST
nova-compute 4 5 13 14 0 0
nova-conductor n/a n/a n/a n/a 15 16
nova-network 13 19 17 n/a 20 n/a
nova-scheduler 1 2 1 1 4 4
glance-api 0 0 0 0 0 0
2 GET, 5
HEAD
4 HEAD 8 HEAD 8 HEAD 8 HEAD 8 HEAD
glance-registry 6 4 8 8 8 8
7 GET 4 GET 8 GET 8 GET 8 GET 8 GET
quantum-server n/a n/a n/a 44 n/a 62
5 GET, 1 POST 9 GET, 1 POST
24
S. Baset, CQ Tang, B. Tak, L. Wang
Why so many SELECT queries in keystone?
• In Diablo, for every keystone GET, 14 SELECT queries are issued, except for first query (16)
• In Essex, for every keystone GET, 6 SELECT queries are issued
• In Folsom-nova-net/quantum, for every keystone GET, 21 SELECT queries are issued, except
for first query (22)
• In Grizzly-nova-net, 27 SELECT queries for each request except for first (1).
– Keystone tokens are also cached. So subsequent queries do not result into full keystone token authentication
• If PKI token verification is used, the number of SELECT queries sent by keystone drop to 7
from 82.
25
keystone 422 54 358 484 82 243
30 GET 9 GET 17 GET 23 GET 3 GET 6 GET, 2POST
S. Baset, CQ Tang, B. Tak, L. Wang
26
Create a VM: overview (3/4)
• What if there is no keystone?
Keystone enabled
Keystone disabled
S. Baset, CQ Tang, B. Tak, L. Wang 26
Diablo Essex Folsom-
nova-
network
Folsom-
quantum
Grizzly-
nova-
network
Grizzly
quantum
SELECT 28 41 51 76 57 100
INSERT 4 4 23 24 37 38
UPDATE 6 10 60 58 74 70
Diablo Essex Folsom-
nova-
network
Folsom-
quantum
Grizzly-
nova-
network
Grizzly
quantum
SELECT 450 95 409 560 139 343
INSERT 4 4 23 24 37 40
UPDATE 6 10 60 58 74 70
S. Baset, CQ Tang, B. Tak, L. Wang
27
Create a VM: overview (4/4)
• Which components are issuing INSERT and UPDATE queries? (keystone enabled for all)
INSERT Diablo Essex Folsom
nova-network
Folsom
quantum
Grizzly
nova-network
Grizzly
quantum
keystone 2
nova-api 3 (3) 3 (3) 6 (10) 6 (10) 7 (21) 7 (21)
nova-compute 1 (12) 2 (12)
nova-conductor 2 (13) 2 (13)
nova-network 1 1 1 2
nova-scheduler 1 1
quantum-server 2 3
S. Baset, CQ Tang, B. Tak, L. Wang
27
UPDATE Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly
nova-network
Grizzly
quantum
nova-api 1 1 9 9 7 7
nova-compute 1 (5) 1 (6) 4 (47) 4 (47)
nova-conductor 5 (59) 5 (59)
nova-network 3 4 3 6 1
nova-scheduler 1 1 1 2 2
quantum-server 1
28
Grizzly
nova-net
SELEC
T
2 block_device_mapping
6 compute_node_stats
6 fixed_ips
1 floating_ips
8 images
4 instance_actions
2 instance_actions_events
1 instance_info_caches
4 networks
2 provider_fw_rules
5 quotas
4 quota_usages
2 reservations
7 role
1 security_group_rules
3 security_groups
4 virtual_interfaces
S. Baset, CQ Tang, B. Tak, L. Wang 28
Grizzly
nova-net
INSERT 12 compute_node_stats
1 instance_actions
2 instance_actions_events
1 instance_id_mappings
1 instance_info_caches
1 instances
13 instance_system_metadata
4 reservations
1
security_group_instance_associatio
n
1 virtual_interfaces
Grizzly
nova-net
UPDATE 6 compute_nodes
44 compute_node_stats
3 fixed_ips
2
instance_actions_events
1 instance_info_caches
8 instances
8 quota_usages
2 reservations
Tables touched for create VM
in grizzly-nova-net
S. Baset, CQ Tang, B. Tak, L. Wang
29
Dataflow flow for creating a server (grizzly) (1/2)
29
nova-api nova-scheduler nova-conductor nova-compute
Create server Check quota
INSERT INTO reservations (instances, expires, usageid1)
INSERT INTO reservations (ram, expires, usageid2)
INSERT INTO reservations (core, expires, usageid3)
UPDATE quota_usages (usageid1)
UPDATE quota_usages (usageid2)
UPDATE quota_usages (usageid3)
Check if images exist
INSERT INTO instances (‘instance_uuid’)
INSERT INTO security_group_instance_association (‘instance_uid’)
INSERT INTO instance_system_metadata (‘image_kernel_id, instance_uuid’)
INSERT INTO instance_system_metadata (‘instance_type_memory_mb’)
INSERT INTO instance_system_metadata (‘instance_type_swap’)
INSERT INTO instance_system_metadata (‘instance_type_vcpu_weight’)
INSERT INTO instance_system_metadata (‘instance_type_root_gb’)
INSERT INTO instance_system_metadata (‘instance_type_id’)
INSERT INTO instance_system_metadata (‘image_ramdisk_id’)
INSERT INTO instance_system_metadata (‘instance_type_name’)
INSERT INTO instance_system_metadata (‘instance_type_ephemeral_gb’)
INSERT INTO instance_system_metadata (‘instance_type_rxtx_factor’)
INSERT INTO instance_system_metadata (‘instance_type_flavorid’)
INSERT INTO instance_system_metadata (‘instance_type_flavorid’)
INSERT INTO instance_system_metadata (‘image_base_image_ref’)
INSERT INTO instance_info_caches (‘instance_uuid)
Create reservations. No request id. Default: expires after
a day if not updated.
Update quotas.
What if nova-api dies here? Then quota updates
can potentially be permanent until expired or cleanup.
Create instance in the database.
30
Dataflow flow for creating a server (grizzly) (2/2)
30
nova-api nova-scheduler nova-compute nova-conductor
INSERT into instance_id_mappings(‘instance_uuid’)
Update time in quota_usages table
INSERT INTO instance_actions (instance_uuid, request_id)
Send to scheduler (request_id)
INSERT into instance_action_events(scheduling)
nova-network
INSERT into instance_actions_events(compute_run)
Libvirt – create instance
UPDATE instances (task_state = NULL)
GET images from glance
UPDATE instances (host, node)
UPDATE compute_node_stats *
INSERT INTO compute_node_stats
UPDATE instances (task_state=networking)
This request is key. It associates instance id
with a request id. But occurs after quota and
reservations has been updated. BAD!!!
S. Baset, CQ Tang, B. Tak, L. Wang
31
How many SQL queries for create VM before a request
is sent to:
S. Baset, CQ Tang, B. Tak, L. Wang 31
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
SELECT 202 10 27 289 98 138
INSERT 0 0 3 10 21 21
UPDATE 0 0 3 9 7 7
S. Baset, CQ Tang, B. Tak, L. Wang
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
SELECT 371 52 292 290 100 140
INSERT 3 3 10 10 22 22
UPDATE 1 2 10 10 8 8
scheduler
compute
Diablo Essex Folsom-nova-
network
Folsom-
quantum
Grizzly-nova-
network
Grizzly
quantum
SELECT 450 95 409 560 139 343
INSERT 4 4 23 24 37 40
UPDATE 6 10 60 58 74 70
32
Create VM total message bytes – read() or recv()
S. Baset, CQ Tang, B. Tak, L. Wang 32
Diablo Essex Folsom
nova-network
Folsom
quantum
Grizzly
nova-network
keystone 154841 23090 198493 269920 41888
nova-api 65596 81836 75507 21435 22766
nova-compute 155233
(113701)
157660
(105460)
202163
(163107)
206003
(167383)
106396
(110721)
nova-conductor n/a n/a n/a n/a 371614
nova-network 98101 77184 62509 n/a 103100
nova-scheduler 3380 38477 16465 19688 29674
glance-registry 36764 16632 45798 46104 30494
glance-api 17440 6326 32386 32716 11248
quantum-server n/a n/a n/a 46533 n/a
quantum-dhcp n/a n/a n/a 3722 n/a
Total 531355 401205 582185 650,615 717,180
S. Baset, CQ Tang, B. Tak, L. WangExcludes any image transfer
33
Create VM total message bytes – write() or send()
S. Baset, CQ Tang, B. Tak, L. Wang 33
Diablo Essex Folsom
nova-network
Folsom
quantum
Grizzly
nova-network
keystone 115606 15129 128957 174884 25364
nova-api 50704 70995 25449 20265 22693
nova-compute 99899 109136 127436 126143
(122363)
74864
(68352)
nova-conductor n/a n/a n/a n/a 222228
nova-network 74106 63446 46123 n/a 57321
nova-scheduler 2964 30182 17662 21993 26997
glance-registry 23095 11006 18210 18196 20329
glance-api 8841 5038 10226 10220 8705
quantum-server n/a n/a n/a 25986 n/a
quantum-dhcp n/a n/a n/a 84 n/a
Total 375,447 305,156 374,499 403,507 458,501
S. Baset, CQ Tang, B. Tak, L. Wang
34
Create a VM: Message exchange with RabbitMQ – send()
Diablo Essex Folsom
nova-network
Folsom-
quantum
Grizzly
nova-network
nova-api 23 (3392) 35 (4769) 23 (8600) 11 (5254) 11 (4062)
nova-compute 18 (1316) 18 (1430) 18 (3782) 1 (21) 306 (67874)
nova-network 31 (1816) 45 (1018) 32 (2159) n/a 14 (1786)
nova-
scheduler
23 (2392) 12 (2976) 12 (7388) 12 (9737) 7 (11567)
nova-
conductor
n/a n/a n/a n/a 317 (82717)
quantum-
server
n/a n/a n/a 36 (4498) n/a
quantum-dhcp n/a n/a n/a 4 (84) n/a
S. Baset, CQ Tang, B. Tak, L. Wang 34S. Baset, CQ Tang, B. Tak, L. Wang
35
Create a VM: Message exchange with RabbitMQ – recv()
Diablo Essex Folsom
nova-network
Folsom-
quantum
Grizzly
nova-network
nova-api 16 (833) 25 (1609) 16 (833) 7 (328) 7 (328)
nova-compute 14 (3442) 14 (2369) 14 (8752) 1 (9479) 230 (94463)
nova-network 18 (1808) 26 (3045) 19 (7298) n/a 8 (2699)
nova-
scheduler
8 (2479) 8 (2918) 8 (5307) 8 (5345) 4 (3861)
nova-
conductor
n/a n/a n/a n/a 172 (58721)
quantum-
server
n/a n/a n/a 24 (396) n/a
quantum-dhcp n/a n/a n/a 4 (3726) n/a
S. Baset, CQ Tang, B. Tak, L. Wang 35S. Baset, CQ Tang, B. Tak, L. Wang
S. Baset, CQ Tang, B. Tak, L. Wang 36
2176 comp
172 cond
1667 gapi
139 greg
3 keys
5429 napi
12 netw
4 sche
308 comp
317 cond
17 gapi
9 greg
3 keys
19 napi
19 netw
7 sche
Create a VM: send() and recv() grizzly-nova net
send() recv()
Single byte recv
in webob library
Conclusions
• Complexity is brewing under OpenStack. Beware!
• Build distributed applications with tracing in mind
• Flow diff
– Through an interactive page
• Ongoing and future work
– Fault injection and log correlation
– Leverage tool for other projects, e.g., CloudFoundry
S. Baset, CQ Tang, B. Tak, L. Wang 37

Contenu connexe

Tendances

Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 

Tendances (20)

Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
 
Jorge de la Cruz [Veeam Software] | RESTful API – How to Consume, Extract, St...
Jorge de la Cruz [Veeam Software] | RESTful API – How to Consume, Extract, St...Jorge de la Cruz [Veeam Software] | RESTful API – How to Consume, Extract, St...
Jorge de la Cruz [Veeam Software] | RESTful API – How to Consume, Extract, St...
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use Cases
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationPedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon Innovation
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 

En vedette

Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014
Ricardo Amaro
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
Ricardo Amaro
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014
Ricardo Amaro
 

En vedette (20)

Open Source Cloud Technologies
Open Source Cloud TechnologiesOpen Source Cloud Technologies
Open Source Cloud Technologies
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
 
SPEC Cloud (TM) IaaS 2016 Benchmark
SPEC Cloud (TM) IaaS 2016 BenchmarkSPEC Cloud (TM) IaaS 2016 Benchmark
SPEC Cloud (TM) IaaS 2016 Benchmark
 
A Survey of Container Security in 2016: A Security Update on Container Platforms
A Survey of Container Security in 2016: A Security Update on Container PlatformsA Survey of Container Security in 2016: A Security Update on Container Platforms
A Survey of Container Security in 2016: A Security Update on Container Platforms
 
Cloud SLAs: Present and Future
Cloud SLAs: Present and FutureCloud SLAs: Present and Future
Cloud SLAs: Present and Future
 
How To Train Your APIs
How To Train Your APIsHow To Train Your APIs
How To Train Your APIs
 
Microservice architecture
Microservice architectureMicroservice architecture
Microservice architecture
 
Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014
 
Building a REST API Microservice for the DevNet API Scavenger Hunt
Building a REST API Microservice for the DevNet API Scavenger HuntBuilding a REST API Microservice for the DevNet API Scavenger Hunt
Building a REST API Microservice for the DevNet API Scavenger Hunt
 
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
 
Introduction to Infrastructure as Code & Automation / Introduction to Chef
Introduction to Infrastructure as Code & Automation / Introduction to ChefIntroduction to Infrastructure as Code & Automation / Introduction to Chef
Introduction to Infrastructure as Code & Automation / Introduction to Chef
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
 
DATA CENTER
DATA CENTER DATA CENTER
DATA CENTER
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the Cloud
 
DOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using SplunkDOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using Splunk
 
Docker security: Rolling out Trust in your container
Docker security: Rolling out Trust in your containerDocker security: Rolling out Trust in your container
Docker security: Rolling out Trust in your container
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
Docker Security
Docker SecurityDocker Security
Docker Security
 

Similaire à Dissecting Open Source Cloud Evolution: An OpenStack Case Study

Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 

Similaire à Dissecting Open Source Cloud Evolution: An OpenStack Case Study (20)

YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Distributed tracing in OpenStack
Distributed tracing in OpenStackDistributed tracing in OpenStack
Distributed tracing in OpenStack
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
OpenStack & OpenContrail in Production
OpenStack & OpenContrail in ProductionOpenStack & OpenContrail in Production
OpenStack & OpenContrail in Production
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Spark in the Maritime Domain
Spark in the Maritime DomainSpark in the Maritime Domain
Spark in the Maritime Domain
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Dissecting Open Source Cloud Evolution: An OpenStack Case Study

  • 1. Dissecting Open Source Cloud Evolution: An OpenStack Case Study Salman Baset, Chunqiang Tang, Byung Chul Tak, Long Wang IBM T. J. Watson Research Center June 26th, 2013
  • 2. Open source cloud projects IaaS PaaS SaaS Broadly two types: (1) Native (listed here) (2) Adapters (e.g., Netflix on EC2) S. Baset, CQ Tang, B. Tak, L. Wang 2
  • 3. Timeline for cloud open source 2006 2007 2008 2009 2010 2011 2012 Amazon EC2 Google App Engine 2005 2001 3
  • 4. Two characteristics of open source cloud systems • Distributed multi-component architecture – Example: OpenStack and Cloud Foundry have more than 10 components for their IaaS controllers • Rapid development by a community of developers S. Baset, CQ Tang, B. Tak, L. Wang 4
  • 5. Rapid development • Open source cloud projects are being developed and released at a rapid pace – OpenStack: releases every six months – Eucalyptus: every four months – OpenShift Enterprise: every four months • Compare it to – Linux kernel: 2-3 months (3.x – 3.(x+1) ) – Ubuntu distro releases: every six months • Major cloud providers are consuming OpenStack directly from development trunk – Two weeks behind the trunk S. Baset, CQ Tang, B. Tak, L. Wang 5
  • 6. Why understand evolution? • Evolution: – A git commit or a major release • Research perspective – How logical operations (e.g., create a VM) change across major versions? • Developer perspective – What is the impact of my committed changes? • Provider perspective – Continuous deployment and delivery • How does a provider gain confidence in deploying a new release in production? • What is the impact of new changes and configuration options on logical operations? – Message flow, performance evaluation, fault injection etc S. Baset, CQ Tang, B. Tak, L. Wang 6
  • 7. Methods for understanding evolution • Static – Source code – Documentation • Dynamic – Log analysis • Lab and/or production – Tracing message flow • With or without code instrumentation • Automatic correlation of message flow with logs • Lab and/or production – Fault injection – Performance study • Lab S. Baset, CQ Tang, B. Tak, L. Wang 7
  • 8. Our solution • Without source code modification – Tracing – Tracing with log correlation – Fault injection • Other solutions – Google Dapper (built RPC framework leveraging callbacks) – Twitter Zipkin (attach identifiers to requests) S. Baset, CQ Tang, B. Tak, L. Wang 8
  • 9. 9 Summary of our solution: Tracing • This simplified diagram shows one example path for one user request. • A path is the series of system events such as RECEIVE and SEND across servers captured using LD_PRELOAD technique. • Prior art: vPath constructs such causal path of system activities initiated by user requests. thread RECEIVE Monitoring Agent events caught application kernel Ex) Apache webserver thread RECEIVE Monitoring Agent events caught application kernel Ex) Application server thread RECEIVE Monitoring Agent events caught application kernel Ex) Database server Request SEND RECEIVE SEND SEND SEND RECEIVE SEND
  • 10. 10 Summary of our solution: Tracing with queues • The path breaks if there are queues in the middle. – Apache web server inserts a message in the queue and returns – Application server retrieves the message from the queue and performs work – How do we correlate these messages? • Augment path information with unique message information – e.g., transaction ids • Run only one logical operation in the system if no unique message information thread RECEIVE Monitoring Agent events caught application kernel Ex) Apache webserver thread RECEIVE Monitoring Agent events caught application kernel Ex) Application server thread RECEIVE Monitoring Agent events caught application kernel Ex) Database server Request SEND RECEIVE SEND SEND SEND RECEIVE SEND Queue
  • 11. 11 Summary of our solution: Log Analysis • Key idea – Combine the log information and causality (path) discovery technique Trace low-level system calls to infer causality and understand how an application executes Monitor log files and link log file entries to observed low-level system calls Link together Improved Semantics for Problem Diagnosis
  • 12. 12 Diagram: Detecting Log Writes • During normal run, – Maintain a mapping between fd and file name string – Maintain a list of known/discovered log files • On ‘write’ system calls, – Check parameters and see if it is a ‘write’ on one of the log files. – If it is, and the data to be written contains alerting keywords such as ‘ERROR’, then this is a log write due to some errors. – This ‘write’ event will be annotated appropriately. Recv Read write SendRequest Websphere /var/log/was.log DB2 /var/log/db2/access.log DB2 /usr/local/db2/fie22xlv.log DB2 /usr/local/db2/fie23xlv.log log file name <Fragment of a Path> Parameters fd=5,offset=2048,data=“ERROR: …” 9 14 5 8 fd application
  • 13. 13 Fault Injection for Building up Knowledge Base for Future Problem Diagnosis • Injects errors, observe application’s behavior, and build a knowledge base for future problem diagnosis – Alters a return value of a system call, e.g., mimic network communication error – It observes the logging reaction. – It repeats this for each system call and for each requests. – It accumulates the observed logging reactions as a knowledge base. • When an error message is logged in a production system, using the knowledge base to infer the probability of different root causes – Construct Bayesian Belief Network for inference • In the example figure, fault injection changes the return value of ‘Read’ event to -1. This triggers an error to be logged at the later part of the path. Recv Read write SendRequest Recv write Return value: 1024 Return value: -1 Parameter data=“ERROR: Record missing.” Newly appeared event Reaction to our error injection Altered
  • 14. Brewing complexity: Evolution of OpenStack loc * Released Nova Cinder Glance Keystone Quantum Swift Total Austin Oct 2010 17,288 12,979 30,627 Bexar Feb 2011 27,734 3,629 16,014 47,377 Cactus Apr 2011 43,947 4,927 16,665 65,539 Diablo Sep 2011 66,395 9,961 12,451 15,591 91,947 Essex Apr 2012 87,750 15,698 11,555 17,646 149,596 Folsom Sep 2012 103,637 31,241 20,271 13,939 42,118 19,114 230,320 Grizzly Apr 2013 120,968 49,797 21,261 20,071 60,485 23,035 321,081 * CRLF and not python loc S. Baset, CQ Tang, B. Tak, L. Wang 14 Methodology wc -l `find . | grep -E '*.py' | grep -v test | grep -v 'doc'` wc -l `find . | grep -E '*.sh' | grep test | grep -v 'doc'`
  • 15. nova database nova-api nova-scheduler nova-compute dashboard (horizon) keystone glance-api glance-registry glance database glance API (REST) AMQPdatabase keystone OpenStack logical architecture (grizzly+net+cinder) 15 keystone database REST REST AMQ P nova nova-conductor cinder-api cinder db AMQP cinder cinder-volume cinder-scheduler nova-network nova-cert nova-cells Compute nodes Volume nodes S. Baset, CQ Tang, B. Tak, L. Wang IMAGE REPO BLOCK STORAGE AUTHENTICATION COMPUTE CONTROLLER
  • 16. nova database nova-api nova-scheduler nova-compute dashboard (horizon) keystone glance-api glance-registry glance database glance API (REST) AMQPdatabase keystone OpenStack logical architecture (grizzly+quantum+cinder) 16 keystone database REST REST AMQ P nova nova-conductor cinder-api cinder db AMQP cinder cinder-volume cinder-scheduler nova-cert nova-cells quantum-server quantum db AMQP quantum quantum-dhcp quantum-plugin quantum- metadata agent Compute nodes Volume nodes quantum-l3 agent quantum-l3 agent IMAGE REPO BLOCK STORAGE AUTHENTICATION COMPUTE CONTROLLER NETWORK CONTROLLER
  • 17. OpenStack tracing • Understand OpenStack data and message flow for logical operations, e.g., – Create a VM – Delete a VM – List VMs – Create a volume – Add or remove volume to a VM – Create a floating IP address – Add or remove floating IP address from a VM – Create or destroy a virtual network • Understand – REST calls – Data flow – AMQP flow – Timing information 17 • Build data consistency tool • Gather data for generating performance load • Build a performance model S. Baset, CQ Tang, B. Tak, L. Wang
  • 18. 18 Key observations from tracing OpenStack (1/2) • OpenStack is evolving very rapidly. Significant behavior changes from one release to another. • Total tables – Grizzly: 105 tables (160 with nova shadow tables), 53 in Diablo • Creating a VM (grizzly) – 139 SELECT queries, 37 INSERT queries, 74 UPDATE queries – 12 tables are touched for INSERT and UPDATE • In Diablo (Sep 2011), there were 450 SELECT, 4 INSERT, and 9 UPDATE queries – 717K read, 458K write – 655 send() calls to AMQP, 414 recv() calls • Deleting a VM – Only single record is deleted from database (rest are archived) • Request-id – Instance and request-id are stored in database (but only after updating quota) and before a request is sent to the scheduler. • Quota management – Entries are inserted in database to indicate resource allocation for a VM. Negative or NULL entries are inserted for deallocation. Each quota entry has expiration time (one day). E.g., core, fixedIP etc. • VM state and task state – networking, block_device_mapping, spawning • Keystone – Token verification is optimized in Grizzly using caches (for flavor=keystone) and PKI 18S. Baset, CQ Tang, B. Tak, L. Wang
  • 19. 19 Key observations from tracing OpenStack (2/2) • Development of a data consistency checking tool – Orphan iptable rules (not associated with VM transaction) => security holes – Orphan data in tables due to errors in VM creation etc => audit and clean up – Orphan virsh data => audit and clean up S. Baset, CQ Tang, B. Tak, L. Wang 19S. Baset, CQ Tang, B. Tak, L. Wang
  • 20. 20 Methodology • Run OpenStack in a machine (w/ and w/o timers disabled) • Diablo, Essex, Folsom, Grizzly • Ubuntu, RabbitMQ, MySQL • Use curl to send API request to OpenStack – flavor=keystone – Image has three parts • AMI, ram disk, kernel image – For keystone, PKI based token verification also used in grizzly – Each service’s token were created before issuing a create or delete VM call • Use our technique to capture message interaction, generate flow, run message analytics, and insert faults (on going) • curl_createserver.sh AUTHTOKEN=$1 curl -i http://9.47.240.166:8774/v2/3283d689d02c41248fc82c202e82055a/servers -X POST -H "X-Auth-Project-Id: admin" - H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: ${AUTHTOKEN}" -d '{"server": {"name": "test1", "imageRef": "de8882fb-94b3-4105-a212-c0a7fd8ab1e9", "flavorRef": "1", "max_count": 1, "min_count": 1, "networks": [{"uuid": "48de54f9-2a60-4f28-9740-d6317086c32a"}] }}' S. Baset, CQ Tang, B. Tak, L. Wang 20S. Baset, CQ Tang, B. Tak, L. Wang
  • 21. 21 SQL queries in create, delete, list VMs and tables touched How to read: Tables touched (SQL queries) – [no of tables with INSERT or UPDATE] Diablo (Sep 2011) Essex (Apr 2012) Folsom (Sep 12 nova-network Folsom quantum Grizzly (April 12) nova-network Grizzly quantum SELECT (create) 16 (450) 17 (95) 21 (409) 26 (560) 20 (139) 37 (343) SELECT (delete) 8 (37) 10 (36) 17 (63) 23 (241) 13 (36) 31 (192) SELECT (list) 5 (31) 4 (12) 6 (24) 7 (25) 1 (1) 1 (1) INSERT (create) 4 (4) 4 (4) 8 (23) 9 (24) 10 (37) 13 (40) INSERT (delete) 0 (0) 0 (0) 1 (3) 1 (3) 3 (6) 4 (6) INSERT (list) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) UPDATE (create) 2 (9) - 5 3 (12) - 5 7 (60) - 11 7 (59) - 13 8 (74) – 13 8 (70) - 16 UPDATE (delete) 4 (6) - 4 6 (10) - 6 8 (22) - 9 8 (25) - 9 10 (31) - 11 10 (26) - 12 UPDATE (list) 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 DELETE (create) 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 0 (0) - 0 DELETE (delete) 1 (1) 1 (1) 1 (1) 1 (1) 1 (1) 1 (1) Tables 53 4 (glance) 9 (keys) 39 (nova 63 4 (glance) 10 (keystone) 49 (nova) 67 5 (glance) 10 (keystone) 52 (nova) 67 (net)/83 q 16 (quantum) + folsom 6 (glance) 19 (keystone) 111 (nova) 55 shadow nova tb 136 (net)/160q 24 (quantum) + grizzly S. Baset, CQ Tang, B. Tak, L. Wang 21
  • 22. 22 Keystone REST flow for creating a server (grizzly) 22 User Keystone nova-api glance-api Credentials Token (role) Get services and endpoints + token Services + endpoints Token + CreateInstance Verify + token Token + GetImage Verify + token image CreateInstance Success Accepted glance-registry Token + GetImage Verify + token image S. Baset, CQ Tang, B. Tak, L. Wang
  • 23. 23 Create a VM: overview (1/4) • Which OpenStack component is issuing SELECT queries? Diablo Essex Folsom- nova- network Folsom- quantum Grizzly- nova- network Girzzly quantum Auth. keystone 422 54 358 484 82 243 API server nova-api 4 11 11 9 10 10 Agent on compute node nova- compute 4 5 13 14 0 0 Controller agent nova- conductor n/a n/a n/a n/a 15 16 Network agent on compute nova- network 13 19 17 n/a 20 n/a Scheduler nova- scheduler 1 2 1 1 4 4 Image registry server glance- registry 6 4 8 8 8 8 Network API server quantum- server n/a n/a n/a 44 n/a 62 23S. Baset, CQ Tang, B. Tak, L. Wang
  • 24. 24 Create a VM: overview (2/4) • How many HTTP requests with respect to SELECT calls? Red indicates REST calls rcvd. Diablo Essex Folsom-nova- network Folsom- quantum Grizzly-nova- network Grizzly quantum keystone 422 54 358 484 82 243 30 GET 9 GET 17 GET 23 GET 3 GET 6 GET, 2POST nova-api 4 11 11 9 10 10 1 POST 1 POST 1 POST 1 POST 1 POST 1 POST nova-compute 4 5 13 14 0 0 nova-conductor n/a n/a n/a n/a 15 16 nova-network 13 19 17 n/a 20 n/a nova-scheduler 1 2 1 1 4 4 glance-api 0 0 0 0 0 0 2 GET, 5 HEAD 4 HEAD 8 HEAD 8 HEAD 8 HEAD 8 HEAD glance-registry 6 4 8 8 8 8 7 GET 4 GET 8 GET 8 GET 8 GET 8 GET quantum-server n/a n/a n/a 44 n/a 62 5 GET, 1 POST 9 GET, 1 POST 24 S. Baset, CQ Tang, B. Tak, L. Wang
  • 25. Why so many SELECT queries in keystone? • In Diablo, for every keystone GET, 14 SELECT queries are issued, except for first query (16) • In Essex, for every keystone GET, 6 SELECT queries are issued • In Folsom-nova-net/quantum, for every keystone GET, 21 SELECT queries are issued, except for first query (22) • In Grizzly-nova-net, 27 SELECT queries for each request except for first (1). – Keystone tokens are also cached. So subsequent queries do not result into full keystone token authentication • If PKI token verification is used, the number of SELECT queries sent by keystone drop to 7 from 82. 25 keystone 422 54 358 484 82 243 30 GET 9 GET 17 GET 23 GET 3 GET 6 GET, 2POST S. Baset, CQ Tang, B. Tak, L. Wang
  • 26. 26 Create a VM: overview (3/4) • What if there is no keystone? Keystone enabled Keystone disabled S. Baset, CQ Tang, B. Tak, L. Wang 26 Diablo Essex Folsom- nova- network Folsom- quantum Grizzly- nova- network Grizzly quantum SELECT 28 41 51 76 57 100 INSERT 4 4 23 24 37 38 UPDATE 6 10 60 58 74 70 Diablo Essex Folsom- nova- network Folsom- quantum Grizzly- nova- network Grizzly quantum SELECT 450 95 409 560 139 343 INSERT 4 4 23 24 37 40 UPDATE 6 10 60 58 74 70 S. Baset, CQ Tang, B. Tak, L. Wang
  • 27. 27 Create a VM: overview (4/4) • Which components are issuing INSERT and UPDATE queries? (keystone enabled for all) INSERT Diablo Essex Folsom nova-network Folsom quantum Grizzly nova-network Grizzly quantum keystone 2 nova-api 3 (3) 3 (3) 6 (10) 6 (10) 7 (21) 7 (21) nova-compute 1 (12) 2 (12) nova-conductor 2 (13) 2 (13) nova-network 1 1 1 2 nova-scheduler 1 1 quantum-server 2 3 S. Baset, CQ Tang, B. Tak, L. Wang 27 UPDATE Diablo Essex Folsom-nova- network Folsom- quantum Grizzly nova-network Grizzly quantum nova-api 1 1 9 9 7 7 nova-compute 1 (5) 1 (6) 4 (47) 4 (47) nova-conductor 5 (59) 5 (59) nova-network 3 4 3 6 1 nova-scheduler 1 1 1 2 2 quantum-server 1
  • 28. 28 Grizzly nova-net SELEC T 2 block_device_mapping 6 compute_node_stats 6 fixed_ips 1 floating_ips 8 images 4 instance_actions 2 instance_actions_events 1 instance_info_caches 4 networks 2 provider_fw_rules 5 quotas 4 quota_usages 2 reservations 7 role 1 security_group_rules 3 security_groups 4 virtual_interfaces S. Baset, CQ Tang, B. Tak, L. Wang 28 Grizzly nova-net INSERT 12 compute_node_stats 1 instance_actions 2 instance_actions_events 1 instance_id_mappings 1 instance_info_caches 1 instances 13 instance_system_metadata 4 reservations 1 security_group_instance_associatio n 1 virtual_interfaces Grizzly nova-net UPDATE 6 compute_nodes 44 compute_node_stats 3 fixed_ips 2 instance_actions_events 1 instance_info_caches 8 instances 8 quota_usages 2 reservations Tables touched for create VM in grizzly-nova-net S. Baset, CQ Tang, B. Tak, L. Wang
  • 29. 29 Dataflow flow for creating a server (grizzly) (1/2) 29 nova-api nova-scheduler nova-conductor nova-compute Create server Check quota INSERT INTO reservations (instances, expires, usageid1) INSERT INTO reservations (ram, expires, usageid2) INSERT INTO reservations (core, expires, usageid3) UPDATE quota_usages (usageid1) UPDATE quota_usages (usageid2) UPDATE quota_usages (usageid3) Check if images exist INSERT INTO instances (‘instance_uuid’) INSERT INTO security_group_instance_association (‘instance_uid’) INSERT INTO instance_system_metadata (‘image_kernel_id, instance_uuid’) INSERT INTO instance_system_metadata (‘instance_type_memory_mb’) INSERT INTO instance_system_metadata (‘instance_type_swap’) INSERT INTO instance_system_metadata (‘instance_type_vcpu_weight’) INSERT INTO instance_system_metadata (‘instance_type_root_gb’) INSERT INTO instance_system_metadata (‘instance_type_id’) INSERT INTO instance_system_metadata (‘image_ramdisk_id’) INSERT INTO instance_system_metadata (‘instance_type_name’) INSERT INTO instance_system_metadata (‘instance_type_ephemeral_gb’) INSERT INTO instance_system_metadata (‘instance_type_rxtx_factor’) INSERT INTO instance_system_metadata (‘instance_type_flavorid’) INSERT INTO instance_system_metadata (‘instance_type_flavorid’) INSERT INTO instance_system_metadata (‘image_base_image_ref’) INSERT INTO instance_info_caches (‘instance_uuid) Create reservations. No request id. Default: expires after a day if not updated. Update quotas. What if nova-api dies here? Then quota updates can potentially be permanent until expired or cleanup. Create instance in the database.
  • 30. 30 Dataflow flow for creating a server (grizzly) (2/2) 30 nova-api nova-scheduler nova-compute nova-conductor INSERT into instance_id_mappings(‘instance_uuid’) Update time in quota_usages table INSERT INTO instance_actions (instance_uuid, request_id) Send to scheduler (request_id) INSERT into instance_action_events(scheduling) nova-network INSERT into instance_actions_events(compute_run) Libvirt – create instance UPDATE instances (task_state = NULL) GET images from glance UPDATE instances (host, node) UPDATE compute_node_stats * INSERT INTO compute_node_stats UPDATE instances (task_state=networking) This request is key. It associates instance id with a request id. But occurs after quota and reservations has been updated. BAD!!! S. Baset, CQ Tang, B. Tak, L. Wang
  • 31. 31 How many SQL queries for create VM before a request is sent to: S. Baset, CQ Tang, B. Tak, L. Wang 31 Diablo Essex Folsom-nova- network Folsom- quantum Grizzly-nova- network Grizzly quantum SELECT 202 10 27 289 98 138 INSERT 0 0 3 10 21 21 UPDATE 0 0 3 9 7 7 S. Baset, CQ Tang, B. Tak, L. Wang Diablo Essex Folsom-nova- network Folsom- quantum Grizzly-nova- network Grizzly quantum SELECT 371 52 292 290 100 140 INSERT 3 3 10 10 22 22 UPDATE 1 2 10 10 8 8 scheduler compute Diablo Essex Folsom-nova- network Folsom- quantum Grizzly-nova- network Grizzly quantum SELECT 450 95 409 560 139 343 INSERT 4 4 23 24 37 40 UPDATE 6 10 60 58 74 70
  • 32. 32 Create VM total message bytes – read() or recv() S. Baset, CQ Tang, B. Tak, L. Wang 32 Diablo Essex Folsom nova-network Folsom quantum Grizzly nova-network keystone 154841 23090 198493 269920 41888 nova-api 65596 81836 75507 21435 22766 nova-compute 155233 (113701) 157660 (105460) 202163 (163107) 206003 (167383) 106396 (110721) nova-conductor n/a n/a n/a n/a 371614 nova-network 98101 77184 62509 n/a 103100 nova-scheduler 3380 38477 16465 19688 29674 glance-registry 36764 16632 45798 46104 30494 glance-api 17440 6326 32386 32716 11248 quantum-server n/a n/a n/a 46533 n/a quantum-dhcp n/a n/a n/a 3722 n/a Total 531355 401205 582185 650,615 717,180 S. Baset, CQ Tang, B. Tak, L. WangExcludes any image transfer
  • 33. 33 Create VM total message bytes – write() or send() S. Baset, CQ Tang, B. Tak, L. Wang 33 Diablo Essex Folsom nova-network Folsom quantum Grizzly nova-network keystone 115606 15129 128957 174884 25364 nova-api 50704 70995 25449 20265 22693 nova-compute 99899 109136 127436 126143 (122363) 74864 (68352) nova-conductor n/a n/a n/a n/a 222228 nova-network 74106 63446 46123 n/a 57321 nova-scheduler 2964 30182 17662 21993 26997 glance-registry 23095 11006 18210 18196 20329 glance-api 8841 5038 10226 10220 8705 quantum-server n/a n/a n/a 25986 n/a quantum-dhcp n/a n/a n/a 84 n/a Total 375,447 305,156 374,499 403,507 458,501 S. Baset, CQ Tang, B. Tak, L. Wang
  • 34. 34 Create a VM: Message exchange with RabbitMQ – send() Diablo Essex Folsom nova-network Folsom- quantum Grizzly nova-network nova-api 23 (3392) 35 (4769) 23 (8600) 11 (5254) 11 (4062) nova-compute 18 (1316) 18 (1430) 18 (3782) 1 (21) 306 (67874) nova-network 31 (1816) 45 (1018) 32 (2159) n/a 14 (1786) nova- scheduler 23 (2392) 12 (2976) 12 (7388) 12 (9737) 7 (11567) nova- conductor n/a n/a n/a n/a 317 (82717) quantum- server n/a n/a n/a 36 (4498) n/a quantum-dhcp n/a n/a n/a 4 (84) n/a S. Baset, CQ Tang, B. Tak, L. Wang 34S. Baset, CQ Tang, B. Tak, L. Wang
  • 35. 35 Create a VM: Message exchange with RabbitMQ – recv() Diablo Essex Folsom nova-network Folsom- quantum Grizzly nova-network nova-api 16 (833) 25 (1609) 16 (833) 7 (328) 7 (328) nova-compute 14 (3442) 14 (2369) 14 (8752) 1 (9479) 230 (94463) nova-network 18 (1808) 26 (3045) 19 (7298) n/a 8 (2699) nova- scheduler 8 (2479) 8 (2918) 8 (5307) 8 (5345) 4 (3861) nova- conductor n/a n/a n/a n/a 172 (58721) quantum- server n/a n/a n/a 24 (396) n/a quantum-dhcp n/a n/a n/a 4 (3726) n/a S. Baset, CQ Tang, B. Tak, L. Wang 35S. Baset, CQ Tang, B. Tak, L. Wang
  • 36. S. Baset, CQ Tang, B. Tak, L. Wang 36 2176 comp 172 cond 1667 gapi 139 greg 3 keys 5429 napi 12 netw 4 sche 308 comp 317 cond 17 gapi 9 greg 3 keys 19 napi 19 netw 7 sche Create a VM: send() and recv() grizzly-nova net send() recv() Single byte recv in webob library
  • 37. Conclusions • Complexity is brewing under OpenStack. Beware! • Build distributed applications with tracing in mind • Flow diff – Through an interactive page • Ongoing and future work – Fault injection and log correlation – Leverage tool for other projects, e.g., CloudFoundry S. Baset, CQ Tang, B. Tak, L. Wang 37

Notes de l'éditeur

  1. Talk about when started How many open source cloud projects? Hadoop is not listed here. Neither Chef, Puppet, Zenoss, Ganglia
  2. Talk about when started
  3. Nova for folsom includes cinder
  4. Repeat the experiment in red