Leveraging open source tools to gain insight into OpenStack Swift

Presented by:
Leveraging open source tools to gain insight
into OpenStack Swift
May 20, 2015
Michael Factor,
IBM Fellow, Storage and
Systems,
IBM Research - Haifa
Dmitry Sotnikov,
System and Storage Researcher,
IBM Research - Haifa
Deep dive insights into Swift
The work was done with help of:
Yaron Weinsberg George Goldberg
For more information contact: dmitrys@il.ibm.com

Swift Monitoring
• Monitoring Swift With StatsD
• https://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd/
• Unified Instrumentation and Metering of Swift
• https://wiki.openstack.org/wiki/UnifiedInstrumentationMetering
• Administrator’s Guide
• http://docs.openstack.org/developer/swift/admin_guide.html#cluster-
telemetry-and-monitoring
“Once you have all this great data, what do you do with it? Well, that’s
going to require its own post.“
“Monitoring Swift With StatsD” by SwiftStack, Inc
2

Swift Monitoring Flow
• StatsD allows deep instrumentation of the Swift
code and can report over 100 metrics.
• Collectd gathers statistics about the system
• Graphite is an enterprise-scale monitoring tool that
stores and displays numeric time-series data
• Logstash is a data pipeline that can normalize the
data to a common format.
• Elasticsearch is a search server that allows
indexing large amounts of data.
• Kibana is a browser based analytics and search
interface for Elasticsearch.
• Spark is a fast and general engine for large-scale
data processing.
• RequestStopper catches the request and returns
success, enabling isolating overheads in a non-
production system.
ProxyServer
Container
Server
Object
Server
Account
Server
StatsD
CPU Statistics
RAM Statistics
Disk Statistics
Monitoring,
Analytics and
Visualization
Node
Swift Node
RequestStopper
3

Benchmark Tool
• COSBench, Intel’s Cloud Object Storage Bench-marking tool
• https://github.com/intel-cloud/cosbench
4

Where Our Journey Starts
• Swift 1.13
• 1 container
• Half a million small objects
• 100 COSBench workers
• What should be the cluster size to run more then 1000 PUTs a
second? (with reasonable response time)
5

• 3 proxy node (Proxy servers only)
• 7 storage nodes (Object, Container and Account servers)
• 20 HDD
• 2 SSD
• 256 GB RAM
• 3 clients machines connected to Proxy
• All network connections are 10 Gbps
Our Hardware - Story #1
520 operations per second
6

Swift Data Path Flow
• The Put object request arrives to one of the Proxies.
• The Proxy sends the request to R (e.g., 3) storage
nodes, that will hold the R (e.g., 3) replicas of that
object.
• Next, the container database is updated
asynchronously to reflect the new object in it.
(https://swiftstack.com/openstack-
swift/architecture/)
• It is not fully asynchronous, but on timeout of 0.5 sec –
first it tries to make a synchronous update.
• When at least two of the three writes to the object
servers return successfully, the proxy server process
will notify the client that the upload was successful.
Proxy Server
Object Server
Container Server
Client Put Request Response
7

Swift Data Path Flow
Proxy Server
Object Server
Container Server
Null Container Server
Null Object Server
Null Proxy Server
While nulling out a server is not useful for a production system, it is
useful to diagnose performance bottlenecks 8

RequestStopper
https://gist.github.com/gilv/7e70ba055f24bcc472b6 9

Swift Data Path Flow – Put Request Response Time
Proxy Server
Object Server
Container Server
RequestStopper at Container Server
RequestStopper at Object Server
RequestStopper at Proxy Server
192.47 ms
47.3 ms
32.89 ms
1.86 ms
10

Object PUT Operations Average Response Time per Swift Component
100 Workers, 500K Objects
0
50
100
150
200
250
SWIFT 1.13 : 1 container SWIFT 1.13 : 100 containers
ResponseTime(ms)
Network RTT to Proxy Proxy Server Object Server Container Server
X 4.7 faster
11

0
50
100
150
200
250
SWIFT 1.13 : 1
container
SWIFT 2.2 : 1
container
SWIFT 1.13 : 100
containers
SWIFT 2.2 : 100
containers
ResponseTime(ms)
Network RTT to Proxy Proxy Server Object Server Container Server
Object PUT Operations Average Response Time
Comparison of Swift 1.13 vs Swift 2.2
100 Workers, 500K Objects
X 3 faster
X 1.5 faster
12

Mixed Workload: 1 Container, 100 Workers, 500K Objects
13
0
200
400
600
800
1000
1200
1400
1600
1800
SWIFT 2.2 SWIFT 1.13
OperationsperSecond
Mixed Workload - 1 container
Read Write Delete
At Mixed Workload SWIFT 2.2
achieves 70% performance
improvement

SWIFT Scalability – Swift 2.2
100 Containers, 100 Workers, 500K Objects
14
0
500
1000
1500
2000
2500
3000
2 3 4 5 6 7
PutOperationsperSecond
Number of Storage Servers
Measured Operation Ratio
0
200
400
600
2 3 4 5 6 7
Time(ms)
Number of Storage Servers
Response Time Distribution
60%-RT 80%-RT 90%-RT 95%-RT 99%-RT
This is a maximal performance that can be achieved by 100
COSBench worker, for Swift 2.2, so adding a new node does
not improves the performance.

#Workers Operations per second Average Response Time (ms)
100 workers
2854.31 38.98
200 workers
3455.17 62.68
400 workers
4323.52 101.96
Influence of number of COSBench Workers on Performance – Swift 2.2
7 Storage Nodes, 500K Objects, 100 Container
15

Story #1 Conclusions: RequestStopper
• In some cases the limiting factor is not throughput but response time
• Response time of the native Swift 1.13 with 1 container is 192 ms 
~5.2 op/sec per COSBench worker 
520 op/sec per 100 COSBench workers
• Reducing the response time to 65 ms at Swift 2.2 helps to get ~1560 IOPS on
same cluster
16

Story #1 Conclusions: Container Server
• The difference in the Container Server performance between Swift 2.2
and Swift 1.13 was due in large part to the container merge_items
speedup patch (https://review.openstack.org/#/c/116992/)
• Container Sharding (https://review.openstack.org/#/c/139921/) still
has a potential to improve the performance for this workload by a
factor of 1.5
17

System Size Influence on Performance
18
0
50
100
150
200
250
300
350
1 10 100 1000 10000 1000001000000
AverageResponseTime(ms)
Number of Objects per Container
1 container 100 containers
0
50
100
150
200
250
300
350
5 kops 10 kops 50 kops 100 kops 500 kops
AverageResponseTime(ms)
Number of Objects at System
1 container 100 containers
SWIFT performance (response time) is influenced by the number of
objects per container. In our environment we identified an optimum
number of objects – need to evaluate what affects the optimal number
of objects per container

Where Our Journey Continues
• September 2014
• Swift 1.13
• 1 container
• Half million small objects
• What should be the cluster size to run more then 1000 PUTs in a
second? ( with reasonable response time )
19

20
Kibana – Put Request Response Time Percentiles
ResponseTime(sec)

Average Response Time – for 1 seconds intervals – SWIFT 2.2
0
50
100
150
200
250
300
350
400
450
2:23:53
2:25:28
2:27:03
2:28:38
2:30:13
2:31:48
2:33:23
2:34:58
2:36:33
2:38:08
2:39:43
2:41:18
2:42:53
2:44:28
2:46:03
2:47:38
2:49:13
2:50:48
2:52:23
2:53:58
2:55:33
2:57:08
2:58:43
3:00:18
3:01:55
3:03:30
3:05:05
3:06:40
3:08:15
3:09:50
3:11:25
3:13:00
3:14:35
3:16:10
3:17:45
3:19:20
3:20:55
3:22:30
3:24:05
3:25:40
3:27:15
3:28:50
3:30:25
3:32:00
Time(ms)
0
50
100
150
200
250
300
350
400
450
2:23:51 2:31:03 2:38:15 2:45:27 2:52:39 2:59:51 3:07:03 3:14:15 3:21:27 3:28:39
Time(ms)
21
https://bugs.launchpad.net/swift/+bug/1450656 ?
There is some peak
each 30 sec
21
ResponseTime(sec)
Time

22
Graphite – PUT Request Response Time
30 seconds30 seconds
ResponseTime(sec)
Time

Zoom-in on Swift Response time outliers (> 0.5 sec)
Request Granularity
PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2
0
2
4
6
8
10
12
14
16
0:36:00 0:36:43 0:37:26 0:38:10 0:38:53 0:39:36 0:40:19
30 seconds
23
Time
ResponseTime(sec)

The effect of fs.xfs.xfssyncd_centisecs on PUT response time
0
100
200
300
400
500
600
700
800
8:08:39
8:09:14
8:09:49
8:10:24
8:10:59
8:11:34
8:12:09
8:12:44
8:13:19
8:13:54
8:14:29
8:15:04
8:15:39
8:16:14
8:16:49
8:17:24
8:17:59
8:18:34
8:19:09
8:19:44
8:20:19
8:20:54
8:21:29
8:22:04
8:22:39
8:23:14
8:23:49
8:24:24
8:24:59
8:25:34
8:26:09
8:26:44
8:27:19
8:27:54
8:28:29
8:29:04
8:29:39
300 seconds 60 seconds
300 seconds
60 seconds
24
ResponseTime(ms)
Time

The effect of fs.xfs.xfssyncd_centisecs on PUT response time
Seconds
Avg-
ResTime
60%-RT 80%-RT 90%-RT 95%-RT 99%-RT 100%-RT
10 83.26 30 50 350 520 700 1,620
30 43.34 30 40 50 60 530 3,690
60 38.81 30 40 50 70 270 5,900
300 31.89 30 40 50 70 220 9,530
Increasing of the fs.xfs.xfssyncd_centisecs improves the 99%
percentile at the price of 100% percentile degradation
25

• 4 object nodes (Object servers)
• 15 HDD
• 128 GB RAM
• 2 metadata nodes (Container and Account servers)
• 2 SSD
• 128 GB RAM
• Internal network connections are 10 Gbps
Our Hardware - Story #2
27

28
Object PUT Workload
100 Workers, 100 Containers, 500K Objects

29
Clients Transmitted Throughput
Throughput(MB/sec)
Time

30
Clients Transmitted vs. Proxy Servers Received
Throughput Comparison
Proxy Servers Received
Throughput
Clients Transmitted
Throughput =
Total Client Transmitted Network
Total Proxy Received Network
Throughput(MB/sec)
Time

31
Proxy Servers Received vs. Proxy Servers Transmitted
Throughput(MB/sec)
Time
Proxy Server Received Throughput
Proxy Server Transmitted Throughput
X3
Proxy Servers Transmitted
Throughput
Throughput

32
Proxy Servers Received and Transmitted vs. Object Servers Received
Throughput(MB/sec)
Time
Object Server Received Throughput
X3
Proxy Servers Transmitted
Throughput
Throughput
Object Servers Received
Throughput

33
Network vs. Disks
Throughput(MB/sec)
Time
Object Server Received Throughput
Object Servers Disks Write Throughput
Object Servers Disks
Write Throughput
Proxy Servers
Received Throughput
Object Servers Received
Throughput
X12

Disks Capacity Utilization
35
Throughput(MB/sec)
Time
New object creation part
Rewrite workload
The expected disks capacity
for all the workload

Number of async_pending requests over the time
36
#async_pendingrequestspersec
Time
Object1
Object2
Object3
Object4

37
Variable object size workload
64 KB
128 KB
32 KB
15 KB
512 KB 1 MB
Throughput(MB/sec)
Time

38
Disk vs. Client Perceived Bandwidth
64 KB
128 KB
32 KB
15 KB
512 KB 1 MB
The overhead is not flat 3x, but instead is a function of object size
Ratio

• 5 object nodes (Object servers)
• 15 HDD
• 128 GB RAM
• 3 metadata nodes (Container and Account servers)
• 3 SSD
• 128 GB RAM
• Internal network connections are 10 Gbps
Our Hardware - Story #3 – without async_pendings
39

40
Network vs. Disks
Throughput(MB/sec)
Time

41
Disks Capacity UtilizationThroughput(MB/sec)
Time

42
Number of async_pending requests over the time
#async_pendingrequestspersec
Time

44
PUT Request Average Response Time (Lower is better)
Object1
Object2
Object3
Object4
Proxy1
Proxy2
PUTRequestResponseTime(ms)
Time
Object Server “Object1” has
much higher response time
“Object2/3/4” have lower
response times than proxies
Proxies

Disks Read and Write Throughputs
45
Object1.Read
Object1.Write
Object2.Read
Object2.Write
Object3.Read
Object3.Write
Object4.Read
Object4.Write

Processes statistics over object servers
46
#Process
Wall Clock Time
Blocked processes –
processes that are
waiting to IO response
Running Blocked

The idle CPU comparison over object servers (Higher is better)
CPU0-CPU9 CPU10-CPU19 CPU20-CPU29 CPU30-CPU39 Stopped47
Percent(%)
Wall Clock Time

•Micro benchmark results (Vdbench 8k random write
workload):
• “Object1” shows an average response time of ~37 ms
• “Object2”, “Object3”, “Object4” show an average response
time of ~30 ms
•Our investigation revealed that “Object1” server
consists of older hardware components even
though all servers were “supposed” to be the same
48

4 Object Servers
3 Object Servers – without Object1
~10% Throughput improvement, although
25% object servers reduction
49

Object Size Avg-Res Time Avg-Proc Time Throughput Bandwidth
15 KB 38.98 ms 38.94 ms 2854.31 op/s 42.81 MB/S
1 MB 105.37 ms 103.6 ms 967.04 op/s 967.04 MB/S
10 MB 852.06 ms 578.62 ms 117.43 op/s 1.17 GB/S
Effect of Object Size on Cluster Bandwidth
50
Back of the envelope calculation:
User Bandwidth * Number of Replicas < Total Proxy Backend Bandwidth
At our case we have 3 proxy servers, and 10 Gbit network:
User Bandwidth*3 < 3*10 Gbit  User Bandwidth < 1.25 GB/sec

Leveraging open source tools to gain insight into OpenStack Swift

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Leveraging open source tools to gain insight into OpenStack Swift

Similaire à Leveraging open source tools to gain insight into OpenStack Swift (20)

Dernier

Dernier (20)

Leveraging open source tools to gain insight into OpenStack Swift

Notes de l'éditeur