Performance monitoring and troubleshooting of cloud based object storage is as much an art as science. Although there are a plethora of open source monitoring tools which gather system metrics, the real challenge is how to utilize them to find the root cause of a problem.
In this presentation we present a general, open source based, step-by-step methodology to understand performance bottlenecks in a OpenStack Swift system. Our approach uses standard tools including logstash, collectd, statsd, elasticsearch, kibana and graphite. We also describe an additional simple Swift middleware we developed to help gain further insights. Finally, we demonstrate results obtained from our approach used in an internal deployment of OpenStack Swift.
Leveraging open source tools to gain insight into OpenStack Swift
1. Presented by:
Leveraging open source tools to gain insight
into OpenStack Swift
May 20, 2015
Michael Factor,
IBM Fellow, Storage and
Systems,
IBM Research - Haifa
Dmitry Sotnikov,
System and Storage Researcher,
IBM Research - Haifa
Deep dive insights into Swift
The work was done with help of:
Yaron Weinsberg George Goldberg
For more information contact: dmitrys@il.ibm.com
2. Swift Monitoring
• Monitoring Swift With StatsD
• https://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd/
• Unified Instrumentation and Metering of Swift
• https://wiki.openstack.org/wiki/UnifiedInstrumentationMetering
• Administrator’s Guide
• http://docs.openstack.org/developer/swift/admin_guide.html#cluster-
telemetry-and-monitoring
“Once you have all this great data, what do you do with it? Well, that’s
going to require its own post.“
“Monitoring Swift With StatsD” by SwiftStack, Inc
2
3. Swift Monitoring Flow
• StatsD allows deep instrumentation of the Swift
code and can report over 100 metrics.
• Collectd gathers statistics about the system
• Graphite is an enterprise-scale monitoring tool that
stores and displays numeric time-series data
• Logstash is a data pipeline that can normalize the
data to a common format.
• Elasticsearch is a search server that allows
indexing large amounts of data.
• Kibana is a browser based analytics and search
interface for Elasticsearch.
• Spark is a fast and general engine for large-scale
data processing.
• RequestStopper catches the request and returns
success, enabling isolating overheads in a non-
production system.
ProxyServer
Container
Server
Object
Server
Account
Server
StatsD
CPU Statistics
RAM Statistics
Disk Statistics
Monitoring,
Analytics and
Visualization
Node
Swift Node
RequestStopper
3
5. Where Our Journey Starts
• Swift 1.13
• 1 container
• Half a million small objects
• 100 COSBench workers
• What should be the cluster size to run more then 1000 PUTs a
second? (with reasonable response time)
5
6. • 3 proxy node (Proxy servers only)
• 7 storage nodes (Object, Container and Account servers)
• 20 HDD
• 2 SSD
• 256 GB RAM
• 3 clients machines connected to Proxy
• All network connections are 10 Gbps
Our Hardware - Story #1
520 operations per second
6
7. Swift Data Path Flow
• The Put object request arrives to one of the Proxies.
• The Proxy sends the request to R (e.g., 3) storage
nodes, that will hold the R (e.g., 3) replicas of that
object.
• Next, the container database is updated
asynchronously to reflect the new object in it.
(https://swiftstack.com/openstack-
swift/architecture/)
• It is not fully asynchronous, but on timeout of 0.5 sec –
first it tries to make a synchronous update.
• When at least two of the three writes to the object
servers return successfully, the proxy server process
will notify the client that the upload was successful.
Proxy Server
Object Server
Container Server
Client Put Request Response
7
8. Swift Data Path Flow
Proxy Server
Object Server
Container Server
Client Put Request Response
Null Container Server
Null Object Server
Null Proxy Server
While nulling out a server is not useful for a production system, it is
useful to diagnose performance bottlenecks 8
10. Swift Data Path Flow – Put Request Response Time
Proxy Server
Object Server
Container Server
Client Put Request Response
RequestStopper at Container Server
RequestStopper at Object Server
RequestStopper at Proxy Server
192.47 ms
47.3 ms
32.89 ms
1.86 ms
10
11. Object PUT Operations Average Response Time per Swift Component
100 Workers, 500K Objects
0
50
100
150
200
250
SWIFT 1.13 : 1 container SWIFT 1.13 : 100 containers
ResponseTime(ms)
Network RTT to Proxy Proxy Server Object Server Container Server
X 4.7 faster
11
12. 0
50
100
150
200
250
SWIFT 1.13 : 1
container
SWIFT 2.2 : 1
container
SWIFT 1.13 : 100
containers
SWIFT 2.2 : 100
containers
ResponseTime(ms)
Network RTT to Proxy Proxy Server Object Server Container Server
Object PUT Operations Average Response Time
Comparison of Swift 1.13 vs Swift 2.2
100 Workers, 500K Objects
X 3 faster
X 1.5 faster
12
14. SWIFT Scalability – Swift 2.2
100 Containers, 100 Workers, 500K Objects
14
0
500
1000
1500
2000
2500
3000
2 3 4 5 6 7
PutOperationsperSecond
Number of Storage Servers
Measured Operation Ratio
0
200
400
600
2 3 4 5 6 7
Time(ms)
Number of Storage Servers
Response Time Distribution
60%-RT 80%-RT 90%-RT 95%-RT 99%-RT
This is a maximal performance that can be achieved by 100
COSBench worker, for Swift 2.2, so adding a new node does
not improves the performance.
15. #Workers Operations per second Average Response Time (ms)
100 workers
2854.31 38.98
200 workers
3455.17 62.68
400 workers
4323.52 101.96
Influence of number of COSBench Workers on Performance – Swift 2.2
7 Storage Nodes, 500K Objects, 100 Container
15
16. Story #1 Conclusions: RequestStopper
• In some cases the limiting factor is not throughput but response time
• Response time of the native Swift 1.13 with 1 container is 192 ms
~5.2 op/sec per COSBench worker
520 op/sec per 100 COSBench workers
• Reducing the response time to 65 ms at Swift 2.2 helps to get ~1560 IOPS on
same cluster
16
17. Story #1 Conclusions: Container Server
• The difference in the Container Server performance between Swift 2.2
and Swift 1.13 was due in large part to the container merge_items
speedup patch (https://review.openstack.org/#/c/116992/)
• Container Sharding (https://review.openstack.org/#/c/139921/) still
has a potential to improve the performance for this workload by a
factor of 1.5
17
18. System Size Influence on Performance
18
0
50
100
150
200
250
300
350
1 10 100 1000 10000 1000001000000
AverageResponseTime(ms)
Number of Objects per Container
1 container 100 containers
0
50
100
150
200
250
300
350
5 kops 10 kops 50 kops 100 kops 500 kops
AverageResponseTime(ms)
Number of Objects at System
1 container 100 containers
SWIFT performance (response time) is influenced by the number of
objects per container. In our environment we identified an optimum
number of objects – need to evaluate what affects the optimal number
of objects per container
19. Where Our Journey Continues
• September 2014
• Swift 1.13
• 1 container
• Half million small objects
• What should be the cluster size to run more then 1000 PUTs in a
second? ( with reasonable response time )
19
20. 20
Kibana – Put Request Response Time Percentiles
ResponseTime(sec)
30. 30
Clients Transmitted vs. Proxy Servers Received
Throughput Comparison
Proxy Servers Received
Throughput
Clients Transmitted
Throughput =
Total Client Transmitted Network
Total Proxy Received Network
Throughput(MB/sec)
Time
31. 31
Proxy Servers Received vs. Proxy Servers Transmitted
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Received Throughput
Proxy Server Transmitted Throughput
X3
Proxy Servers Transmitted
Throughput
Proxy Servers Received
Throughput
32. 32
Proxy Servers Received and Transmitted vs. Object Servers Received
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Transmitted Throughput
Proxy Server Received Throughput
Object Server Received Throughput
X3
Proxy Servers Transmitted
Throughput
Proxy Servers Received
Throughput
Object Servers Received
Throughput
33. 33
Network vs. Disks
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Transmitted Throughput
Proxy Server Received Throughput
Object Server Received Throughput
Object Servers Disks Write Throughput
Object Servers Disks
Write Throughput
Proxy Servers
Received Throughput
Object Servers Received
Throughput
X12
36. Number of async_pending requests over the time
36
#async_pendingrequestspersec
Time
Object1
Object2
Object3
Object4
37. 37
Variable object size workload
64 KB
128 KB
32 KB
15 KB
512 KB 1 MB
Throughput(MB/sec)
Time
Proxy Server Received Throughput
Proxy Server Transmitted Throughput
Object Servers Disks Write Throughput
38. 38
Disk vs. Client Perceived Bandwidth
64 KB
128 KB
32 KB
15 KB
512 KB 1 MB
The overhead is not flat 3x, but instead is a function of object size
Ratio
40. 40
Network vs. Disks
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Received Throughput
Proxy Server Transmitted Throughput
Object Servers Disks Write Throughput
44. 44
PUT Request Average Response Time (Lower is better)
Object1
Object2
Object3
Object4
Proxy1
Proxy2
PUTRequestResponseTime(ms)
Time
Object Server “Object1” has
much higher response time
“Object2/3/4” have lower
response times than proxies
Proxies
46. Processes statistics over object servers
46
#Process
Wall Clock Time
Blocked processes –
processes that are
waiting to IO response
Running Blocked
47. The idle CPU comparison over object servers (Higher is better)
CPU0-CPU9 CPU10-CPU19 CPU20-CPU29 CPU30-CPU39 Stopped47
Percent(%)
Wall Clock Time
48. •Micro benchmark results (Vdbench 8k random write
workload):
• “Object1” shows an average response time of ~37 ms
• “Object2”, “Object3”, “Object4” show an average response
time of ~30 ms
•Our investigation revealed that “Object1” server
consists of older hardware components even
though all servers were “supposed” to be the same
48
49. 4 Object Servers
3 Object Servers – without Object1
~10% Throughput improvement, although
25% object servers reduction
49
50. Object Size Avg-Res Time Avg-Proc Time Throughput Bandwidth
15 KB 38.98 ms 38.94 ms 2854.31 op/s 42.81 MB/S
1 MB 105.37 ms 103.6 ms 967.04 op/s 967.04 MB/S
10 MB 852.06 ms 578.62 ms 117.43 op/s 1.17 GB/S
Effect of Object Size on Cluster Bandwidth
50
Back of the envelope calculation:
User Bandwidth * Number of Replicas < Total Proxy Backend Bandwidth
At our case we have 3 proxy servers, and 10 Gbit network:
User Bandwidth*3 < 3*10 Gbit User Bandwidth < 1.25 GB/sec
Notes de l'éditeur
StatsD allows deep instrumentation of the Swift code and can report 124 metrics across 15 swift daemons and the tempauth middleware.
One day my boss came to me and asked: what is the cluster size that I need to get 1000 iops per second?
One day my boss came to me and asked: what is the cluster size that I need to get 1000 iops per second?