SlideShare une entreprise Scribd logo
1  sur  50
Presented by:
Leveraging open source tools to gain insight
into OpenStack Swift
May 20, 2015
Michael Factor,
IBM Fellow, Storage and
Systems,
IBM Research - Haifa
Dmitry Sotnikov,
System and Storage Researcher,
IBM Research - Haifa
Deep dive insights into Swift
The work was done with help of:
Yaron Weinsberg George Goldberg
For more information contact: dmitrys@il.ibm.com
Swift Monitoring
• Monitoring Swift With StatsD
• https://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd/
• Unified Instrumentation and Metering of Swift
• https://wiki.openstack.org/wiki/UnifiedInstrumentationMetering
• Administrator’s Guide
• http://docs.openstack.org/developer/swift/admin_guide.html#cluster-
telemetry-and-monitoring
“Once you have all this great data, what do you do with it? Well, that’s
going to require its own post.“
“Monitoring Swift With StatsD” by SwiftStack, Inc
2
Swift Monitoring Flow
• StatsD allows deep instrumentation of the Swift
code and can report over 100 metrics.
• Collectd gathers statistics about the system
• Graphite is an enterprise-scale monitoring tool that
stores and displays numeric time-series data
• Logstash is a data pipeline that can normalize the
data to a common format.
• Elasticsearch is a search server that allows
indexing large amounts of data.
• Kibana is a browser based analytics and search
interface for Elasticsearch.
• Spark is a fast and general engine for large-scale
data processing.
• RequestStopper catches the request and returns
success, enabling isolating overheads in a non-
production system.
ProxyServer
Container
Server
Object
Server
Account
Server
StatsD
CPU Statistics
RAM Statistics
Disk Statistics
Monitoring,
Analytics and
Visualization
Node
Swift Node
RequestStopper
3
Benchmark Tool
• COSBench, Intel’s Cloud Object Storage Bench-marking tool
• https://github.com/intel-cloud/cosbench
4
Where Our Journey Starts
• Swift 1.13
• 1 container
• Half a million small objects
• 100 COSBench workers
• What should be the cluster size to run more then 1000 PUTs a
second? (with reasonable response time)
5
• 3 proxy node (Proxy servers only)
• 7 storage nodes (Object, Container and Account servers)
• 20 HDD
• 2 SSD
• 256 GB RAM
• 3 clients machines connected to Proxy
• All network connections are 10 Gbps
Our Hardware - Story #1
520 operations per second
6
Swift Data Path Flow
• The Put object request arrives to one of the Proxies.
• The Proxy sends the request to R (e.g., 3) storage
nodes, that will hold the R (e.g., 3) replicas of that
object.
• Next, the container database is updated
asynchronously to reflect the new object in it.
(https://swiftstack.com/openstack-
swift/architecture/)
• It is not fully asynchronous, but on timeout of 0.5 sec –
first it tries to make a synchronous update.
• When at least two of the three writes to the object
servers return successfully, the proxy server process
will notify the client that the upload was successful.
Proxy Server
Object Server
Container Server
Client Put Request Response
7
Swift Data Path Flow
Proxy Server
Object Server
Container Server
Client Put Request Response
Null Container Server
Null Object Server
Null Proxy Server
While nulling out a server is not useful for a production system, it is
useful to diagnose performance bottlenecks 8
RequestStopper
https://gist.github.com/gilv/7e70ba055f24bcc472b6 9
Swift Data Path Flow – Put Request Response Time
Proxy Server
Object Server
Container Server
Client Put Request Response
RequestStopper at Container Server
RequestStopper at Object Server
RequestStopper at Proxy Server
192.47 ms
47.3 ms
32.89 ms
1.86 ms
10
Object PUT Operations Average Response Time per Swift Component
100 Workers, 500K Objects
0
50
100
150
200
250
SWIFT 1.13 : 1 container SWIFT 1.13 : 100 containers
ResponseTime(ms)
Network RTT to Proxy Proxy Server Object Server Container Server
X 4.7 faster
11
0
50
100
150
200
250
SWIFT 1.13 : 1
container
SWIFT 2.2 : 1
container
SWIFT 1.13 : 100
containers
SWIFT 2.2 : 100
containers
ResponseTime(ms)
Network RTT to Proxy Proxy Server Object Server Container Server
Object PUT Operations Average Response Time
Comparison of Swift 1.13 vs Swift 2.2
100 Workers, 500K Objects
X 3 faster
X 1.5 faster
12
Mixed Workload: 1 Container, 100 Workers, 500K Objects
13
0
200
400
600
800
1000
1200
1400
1600
1800
SWIFT 2.2 SWIFT 1.13
OperationsperSecond
Mixed Workload - 1 container
Read Write Delete
At Mixed Workload SWIFT 2.2
achieves 70% performance
improvement
SWIFT Scalability – Swift 2.2
100 Containers, 100 Workers, 500K Objects
14
0
500
1000
1500
2000
2500
3000
2 3 4 5 6 7
PutOperationsperSecond
Number of Storage Servers
Measured Operation Ratio
0
200
400
600
2 3 4 5 6 7
Time(ms)
Number of Storage Servers
Response Time Distribution
60%-RT 80%-RT 90%-RT 95%-RT 99%-RT
This is a maximal performance that can be achieved by 100
COSBench worker, for Swift 2.2, so adding a new node does
not improves the performance.
#Workers Operations per second Average Response Time (ms)
100 workers
2854.31 38.98
200 workers
3455.17 62.68
400 workers
4323.52 101.96
Influence of number of COSBench Workers on Performance – Swift 2.2
7 Storage Nodes, 500K Objects, 100 Container
15
Story #1 Conclusions: RequestStopper
• In some cases the limiting factor is not throughput but response time
• Response time of the native Swift 1.13 with 1 container is 192 ms 
~5.2 op/sec per COSBench worker 
520 op/sec per 100 COSBench workers
• Reducing the response time to 65 ms at Swift 2.2 helps to get ~1560 IOPS on
same cluster
16
Story #1 Conclusions: Container Server
• The difference in the Container Server performance between Swift 2.2
and Swift 1.13 was due in large part to the container merge_items
speedup patch (https://review.openstack.org/#/c/116992/)
• Container Sharding (https://review.openstack.org/#/c/139921/) still
has a potential to improve the performance for this workload by a
factor of 1.5
17
System Size Influence on Performance
18
0
50
100
150
200
250
300
350
1 10 100 1000 10000 1000001000000
AverageResponseTime(ms)
Number of Objects per Container
1 container 100 containers
0
50
100
150
200
250
300
350
5 kops 10 kops 50 kops 100 kops 500 kops
AverageResponseTime(ms)
Number of Objects at System
1 container 100 containers
SWIFT performance (response time) is influenced by the number of
objects per container. In our environment we identified an optimum
number of objects – need to evaluate what affects the optimal number
of objects per container
Where Our Journey Continues
• September 2014
• Swift 1.13
• 1 container
• Half million small objects
• What should be the cluster size to run more then 1000 PUTs in a
second? ( with reasonable response time )
19
20
Kibana – Put Request Response Time Percentiles
ResponseTime(sec)
Average Response Time – for 1 seconds intervals – SWIFT 2.2
0
50
100
150
200
250
300
350
400
450
2:23:53
2:25:28
2:27:03
2:28:38
2:30:13
2:31:48
2:33:23
2:34:58
2:36:33
2:38:08
2:39:43
2:41:18
2:42:53
2:44:28
2:46:03
2:47:38
2:49:13
2:50:48
2:52:23
2:53:58
2:55:33
2:57:08
2:58:43
3:00:18
3:01:55
3:03:30
3:05:05
3:06:40
3:08:15
3:09:50
3:11:25
3:13:00
3:14:35
3:16:10
3:17:45
3:19:20
3:20:55
3:22:30
3:24:05
3:25:40
3:27:15
3:28:50
3:30:25
3:32:00
Time(ms)
0
50
100
150
200
250
300
350
400
450
2:23:51 2:31:03 2:38:15 2:45:27 2:52:39 2:59:51 3:07:03 3:14:15 3:21:27 3:28:39
Time(ms)
21
https://bugs.launchpad.net/swift/+bug/1450656 ?
There is some peak
each 30 sec
21
ResponseTime(sec)
Time
22
Graphite – PUT Request Response Time
30 seconds30 seconds
ResponseTime(sec)
Time
Zoom-in on Swift Response time outliers (> 0.5 sec)
Request Granularity
PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2
0
2
4
6
8
10
12
14
16
0:36:00 0:36:43 0:37:26 0:38:10 0:38:53 0:39:36 0:40:19
30 seconds
23
Time
ResponseTime(sec)
The effect of fs.xfs.xfssyncd_centisecs on PUT response time
PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2
0
100
200
300
400
500
600
700
800
8:08:39
8:09:14
8:09:49
8:10:24
8:10:59
8:11:34
8:12:09
8:12:44
8:13:19
8:13:54
8:14:29
8:15:04
8:15:39
8:16:14
8:16:49
8:17:24
8:17:59
8:18:34
8:19:09
8:19:44
8:20:19
8:20:54
8:21:29
8:22:04
8:22:39
8:23:14
8:23:49
8:24:24
8:24:59
8:25:34
8:26:09
8:26:44
8:27:19
8:27:54
8:28:29
8:29:04
8:29:39
300 seconds 60 seconds
300 seconds
60 seconds
24
ResponseTime(ms)
Time
The effect of fs.xfs.xfssyncd_centisecs on PUT response time
PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2
Seconds
Avg-
ResTime
60%-RT 80%-RT 90%-RT 95%-RT 99%-RT 100%-RT
10 83.26 30 50 350 520 700 1,620
30 43.34 30 40 50 60 530 3,690
60 38.81 30 40 50 70 270 5,900
300 31.89 30 40 50 70 220 9,530
Increasing of the fs.xfs.xfssyncd_centisecs improves the 99%
percentile at the price of 100% percentile degradation
25
Story #2
26
• 2 proxy node (Proxy servers only)
• 4 object nodes (Object servers)
• 15 HDD
• 128 GB RAM
• 2 metadata nodes (Container and Account servers)
• 2 SSD
• 128 GB RAM
• 2 clients machines connected to Proxy
• Internal network connections are 10 Gbps
Our Hardware - Story #2
27
28
Object PUT Workload
100 Workers, 100 Containers, 500K Objects
29
Clients Transmitted Throughput
Throughput(MB/sec)
Time
30
Clients Transmitted vs. Proxy Servers Received
Throughput Comparison
Proxy Servers Received
Throughput
Clients Transmitted
Throughput =
Total Client Transmitted Network
Total Proxy Received Network
Throughput(MB/sec)
Time
31
Proxy Servers Received vs. Proxy Servers Transmitted
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Received Throughput
Proxy Server Transmitted Throughput
X3
Proxy Servers Transmitted
Throughput
Proxy Servers Received
Throughput
32
Proxy Servers Received and Transmitted vs. Object Servers Received
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Transmitted Throughput
Proxy Server Received Throughput
Object Server Received Throughput
X3
Proxy Servers Transmitted
Throughput
Proxy Servers Received
Throughput
Object Servers Received
Throughput
33
Network vs. Disks
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Transmitted Throughput
Proxy Server Received Throughput
Object Server Received Throughput
Object Servers Disks Write Throughput
Object Servers Disks
Write Throughput
Proxy Servers
Received Throughput
Object Servers Received
Throughput
X12
34
Total Disks Capacity
Disks Capacity Utilization
35
Throughput(MB/sec)
Time
New object creation part
Rewrite workload
The expected disks capacity
for all the workload
Number of async_pending requests over the time
36
#async_pendingrequestspersec
Time
Object1
Object2
Object3
Object4
37
Variable object size workload
64 KB
128 KB
32 KB
15 KB
512 KB 1 MB
Throughput(MB/sec)
Time
Proxy Server Received Throughput
Proxy Server Transmitted Throughput
Object Servers Disks Write Throughput
38
Disk vs. Client Perceived Bandwidth
64 KB
128 KB
32 KB
15 KB
512 KB 1 MB
The overhead is not flat 3x, but instead is a function of object size
Ratio
• 2 proxy node (Proxy servers only)
• 5 object nodes (Object servers)
• 15 HDD
• 128 GB RAM
• 3 metadata nodes (Container and Account servers)
• 3 SSD
• 128 GB RAM
• 2 clients machines connected to Proxy
• Internal network connections are 10 Gbps
Our Hardware - Story #3 – without async_pendings
39
40
Network vs. Disks
Throughput Comparison
Throughput(MB/sec)
Time
Proxy Server Received Throughput
Proxy Server Transmitted Throughput
Object Servers Disks Write Throughput
41
Disks Capacity UtilizationThroughput(MB/sec)
Time
42
Number of async_pending requests over the time
#async_pendingrequestspersec
Time
Back to Story #2
43
44
PUT Request Average Response Time (Lower is better)
Object1
Object2
Object3
Object4
Proxy1
Proxy2
PUTRequestResponseTime(ms)
Time
Object Server “Object1” has
much higher response time
“Object2/3/4” have lower
response times than proxies
Proxies
Disks Read and Write Throughputs
45
Object1.Read
Object1.Write
Object2.Read
Object2.Write
Object3.Read
Object3.Write
Object4.Read
Object4.Write
Processes statistics over object servers
46
#Process
Wall Clock Time
Blocked processes –
processes that are
waiting to IO response
Running Blocked
The idle CPU comparison over object servers (Higher is better)
CPU0-CPU9 CPU10-CPU19 CPU20-CPU29 CPU30-CPU39 Stopped47
Percent(%)
Wall Clock Time
•Micro benchmark results (Vdbench 8k random write
workload):
• “Object1” shows an average response time of ~37 ms
• “Object2”, “Object3”, “Object4” show an average response
time of ~30 ms
•Our investigation revealed that “Object1” server
consists of older hardware components even
though all servers were “supposed” to be the same
48
4 Object Servers
3 Object Servers – without Object1
~10% Throughput improvement, although
25% object servers reduction
49
Object Size Avg-Res Time Avg-Proc Time Throughput Bandwidth
15 KB 38.98 ms 38.94 ms 2854.31 op/s 42.81 MB/S
1 MB 105.37 ms 103.6 ms 967.04 op/s 967.04 MB/S
10 MB 852.06 ms 578.62 ms 117.43 op/s 1.17 GB/S
Effect of Object Size on Cluster Bandwidth
50
Back of the envelope calculation:
User Bandwidth * Number of Replicas < Total Proxy Backend Bandwidth
At our case we have 3 proxy servers, and 10 Gbit network:
User Bandwidth*3 < 3*10 Gbit  User Bandwidth < 1.25 GB/sec

Contenu connexe

Tendances

Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherCloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherTamir Dresher
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerRaghavendra Prabhu
 
Scalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with RedisScalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with RedisAvram Lyon
 
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Andrew Zakordonets
 
Fake It 'Til You Make It
Fake It 'Til You Make ItFake It 'Til You Make It
Fake It 'Til You Make ItJohn Stanford
 
PyCon AU 2015 - Using benchmarks to understand how wsgi servers work
PyCon AU 2015  - Using benchmarks to understand how wsgi servers workPyCon AU 2015  - Using benchmarks to understand how wsgi servers work
PyCon AU 2015 - Using benchmarks to understand how wsgi servers workGraham Dumpleton
 
What's new in Ansible 2.0
What's new in Ansible 2.0What's new in Ansible 2.0
What's new in Ansible 2.0Allan Denot
 
Aws S3 uploading tricks 2016
Aws S3 uploading tricks 2016Aws S3 uploading tricks 2016
Aws S3 uploading tricks 2016Bogdan Naydenov
 
"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017Alex Borysov
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Datadog
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesLaurent Bernaille
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logsLaurent Bernaille
 
MongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the CloudMongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the CloudMongoDB
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comIvan Kruglov
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 

Tendances (20)

Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherCloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
 
Scalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with RedisScalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with Redis
 
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
 
Fake It 'Til You Make It
Fake It 'Til You Make ItFake It 'Til You Make It
Fake It 'Til You Make It
 
PyCon AU 2015 - Using benchmarks to understand how wsgi servers work
PyCon AU 2015  - Using benchmarks to understand how wsgi servers workPyCon AU 2015  - Using benchmarks to understand how wsgi servers work
PyCon AU 2015 - Using benchmarks to understand how wsgi servers work
 
What's new in Ansible 2.0
What's new in Ansible 2.0What's new in Ansible 2.0
What's new in Ansible 2.0
 
Wcat
WcatWcat
Wcat
 
Aws S3 uploading tricks 2016
Aws S3 uploading tricks 2016Aws S3 uploading tricks 2016
Aws S3 uploading tricks 2016
 
"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror Stories
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logs
 
MongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the CloudMongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the Cloud
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.com
 
Mysql Latency
Mysql LatencyMysql Latency
Mysql Latency
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 

Similaire à Leveraging open source tools to gain insight into OpenStack Swift

Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Vert.x – The problem of real-time data binding
Vert.x – The problem of real-time data bindingVert.x – The problem of real-time data binding
Vert.x – The problem of real-time data bindingAlex Derkach
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵Amazon Web Services Korea
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsSamantha Quiñones
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorStéphane Maldini
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving ForwardON.Lab
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Non-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.jsNon-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.jsMarcus Frödin
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 

Similaire à Leveraging open source tools to gain insight into OpenStack Swift (20)

Final_Presentation_Docker_KP
Final_Presentation_Docker_KPFinal_Presentation_Docker_KP
Final_Presentation_Docker_KP
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Vert.x – The problem of real-time data binding
Vert.x – The problem of real-time data bindingVert.x – The problem of real-time data binding
Vert.x – The problem of real-time data binding
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 
About time
About timeAbout time
About time
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving Forward
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Non-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.jsNon-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.js
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
 

Dernier

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Dernier (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Leveraging open source tools to gain insight into OpenStack Swift

  • 1. Presented by: Leveraging open source tools to gain insight into OpenStack Swift May 20, 2015 Michael Factor, IBM Fellow, Storage and Systems, IBM Research - Haifa Dmitry Sotnikov, System and Storage Researcher, IBM Research - Haifa Deep dive insights into Swift The work was done with help of: Yaron Weinsberg George Goldberg For more information contact: dmitrys@il.ibm.com
  • 2. Swift Monitoring • Monitoring Swift With StatsD • https://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd/ • Unified Instrumentation and Metering of Swift • https://wiki.openstack.org/wiki/UnifiedInstrumentationMetering • Administrator’s Guide • http://docs.openstack.org/developer/swift/admin_guide.html#cluster- telemetry-and-monitoring “Once you have all this great data, what do you do with it? Well, that’s going to require its own post.“ “Monitoring Swift With StatsD” by SwiftStack, Inc 2
  • 3. Swift Monitoring Flow • StatsD allows deep instrumentation of the Swift code and can report over 100 metrics. • Collectd gathers statistics about the system • Graphite is an enterprise-scale monitoring tool that stores and displays numeric time-series data • Logstash is a data pipeline that can normalize the data to a common format. • Elasticsearch is a search server that allows indexing large amounts of data. • Kibana is a browser based analytics and search interface for Elasticsearch. • Spark is a fast and general engine for large-scale data processing. • RequestStopper catches the request and returns success, enabling isolating overheads in a non- production system. ProxyServer Container Server Object Server Account Server StatsD CPU Statistics RAM Statistics Disk Statistics Monitoring, Analytics and Visualization Node Swift Node RequestStopper 3
  • 4. Benchmark Tool • COSBench, Intel’s Cloud Object Storage Bench-marking tool • https://github.com/intel-cloud/cosbench 4
  • 5. Where Our Journey Starts • Swift 1.13 • 1 container • Half a million small objects • 100 COSBench workers • What should be the cluster size to run more then 1000 PUTs a second? (with reasonable response time) 5
  • 6. • 3 proxy node (Proxy servers only) • 7 storage nodes (Object, Container and Account servers) • 20 HDD • 2 SSD • 256 GB RAM • 3 clients machines connected to Proxy • All network connections are 10 Gbps Our Hardware - Story #1 520 operations per second 6
  • 7. Swift Data Path Flow • The Put object request arrives to one of the Proxies. • The Proxy sends the request to R (e.g., 3) storage nodes, that will hold the R (e.g., 3) replicas of that object. • Next, the container database is updated asynchronously to reflect the new object in it. (https://swiftstack.com/openstack- swift/architecture/) • It is not fully asynchronous, but on timeout of 0.5 sec – first it tries to make a synchronous update. • When at least two of the three writes to the object servers return successfully, the proxy server process will notify the client that the upload was successful. Proxy Server Object Server Container Server Client Put Request Response 7
  • 8. Swift Data Path Flow Proxy Server Object Server Container Server Client Put Request Response Null Container Server Null Object Server Null Proxy Server While nulling out a server is not useful for a production system, it is useful to diagnose performance bottlenecks 8
  • 10. Swift Data Path Flow – Put Request Response Time Proxy Server Object Server Container Server Client Put Request Response RequestStopper at Container Server RequestStopper at Object Server RequestStopper at Proxy Server 192.47 ms 47.3 ms 32.89 ms 1.86 ms 10
  • 11. Object PUT Operations Average Response Time per Swift Component 100 Workers, 500K Objects 0 50 100 150 200 250 SWIFT 1.13 : 1 container SWIFT 1.13 : 100 containers ResponseTime(ms) Network RTT to Proxy Proxy Server Object Server Container Server X 4.7 faster 11
  • 12. 0 50 100 150 200 250 SWIFT 1.13 : 1 container SWIFT 2.2 : 1 container SWIFT 1.13 : 100 containers SWIFT 2.2 : 100 containers ResponseTime(ms) Network RTT to Proxy Proxy Server Object Server Container Server Object PUT Operations Average Response Time Comparison of Swift 1.13 vs Swift 2.2 100 Workers, 500K Objects X 3 faster X 1.5 faster 12
  • 13. Mixed Workload: 1 Container, 100 Workers, 500K Objects 13 0 200 400 600 800 1000 1200 1400 1600 1800 SWIFT 2.2 SWIFT 1.13 OperationsperSecond Mixed Workload - 1 container Read Write Delete At Mixed Workload SWIFT 2.2 achieves 70% performance improvement
  • 14. SWIFT Scalability – Swift 2.2 100 Containers, 100 Workers, 500K Objects 14 0 500 1000 1500 2000 2500 3000 2 3 4 5 6 7 PutOperationsperSecond Number of Storage Servers Measured Operation Ratio 0 200 400 600 2 3 4 5 6 7 Time(ms) Number of Storage Servers Response Time Distribution 60%-RT 80%-RT 90%-RT 95%-RT 99%-RT This is a maximal performance that can be achieved by 100 COSBench worker, for Swift 2.2, so adding a new node does not improves the performance.
  • 15. #Workers Operations per second Average Response Time (ms) 100 workers 2854.31 38.98 200 workers 3455.17 62.68 400 workers 4323.52 101.96 Influence of number of COSBench Workers on Performance – Swift 2.2 7 Storage Nodes, 500K Objects, 100 Container 15
  • 16. Story #1 Conclusions: RequestStopper • In some cases the limiting factor is not throughput but response time • Response time of the native Swift 1.13 with 1 container is 192 ms  ~5.2 op/sec per COSBench worker  520 op/sec per 100 COSBench workers • Reducing the response time to 65 ms at Swift 2.2 helps to get ~1560 IOPS on same cluster 16
  • 17. Story #1 Conclusions: Container Server • The difference in the Container Server performance between Swift 2.2 and Swift 1.13 was due in large part to the container merge_items speedup patch (https://review.openstack.org/#/c/116992/) • Container Sharding (https://review.openstack.org/#/c/139921/) still has a potential to improve the performance for this workload by a factor of 1.5 17
  • 18. System Size Influence on Performance 18 0 50 100 150 200 250 300 350 1 10 100 1000 10000 1000001000000 AverageResponseTime(ms) Number of Objects per Container 1 container 100 containers 0 50 100 150 200 250 300 350 5 kops 10 kops 50 kops 100 kops 500 kops AverageResponseTime(ms) Number of Objects at System 1 container 100 containers SWIFT performance (response time) is influenced by the number of objects per container. In our environment we identified an optimum number of objects – need to evaluate what affects the optimal number of objects per container
  • 19. Where Our Journey Continues • September 2014 • Swift 1.13 • 1 container • Half million small objects • What should be the cluster size to run more then 1000 PUTs in a second? ( with reasonable response time ) 19
  • 20. 20 Kibana – Put Request Response Time Percentiles ResponseTime(sec)
  • 21. Average Response Time – for 1 seconds intervals – SWIFT 2.2 0 50 100 150 200 250 300 350 400 450 2:23:53 2:25:28 2:27:03 2:28:38 2:30:13 2:31:48 2:33:23 2:34:58 2:36:33 2:38:08 2:39:43 2:41:18 2:42:53 2:44:28 2:46:03 2:47:38 2:49:13 2:50:48 2:52:23 2:53:58 2:55:33 2:57:08 2:58:43 3:00:18 3:01:55 3:03:30 3:05:05 3:06:40 3:08:15 3:09:50 3:11:25 3:13:00 3:14:35 3:16:10 3:17:45 3:19:20 3:20:55 3:22:30 3:24:05 3:25:40 3:27:15 3:28:50 3:30:25 3:32:00 Time(ms) 0 50 100 150 200 250 300 350 400 450 2:23:51 2:31:03 2:38:15 2:45:27 2:52:39 2:59:51 3:07:03 3:14:15 3:21:27 3:28:39 Time(ms) 21 https://bugs.launchpad.net/swift/+bug/1450656 ? There is some peak each 30 sec 21 ResponseTime(sec) Time
  • 22. 22 Graphite – PUT Request Response Time 30 seconds30 seconds ResponseTime(sec) Time
  • 23. Zoom-in on Swift Response time outliers (> 0.5 sec) Request Granularity PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2 0 2 4 6 8 10 12 14 16 0:36:00 0:36:43 0:37:26 0:38:10 0:38:53 0:39:36 0:40:19 30 seconds 23 Time ResponseTime(sec)
  • 24. The effect of fs.xfs.xfssyncd_centisecs on PUT response time PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2 0 100 200 300 400 500 600 700 800 8:08:39 8:09:14 8:09:49 8:10:24 8:10:59 8:11:34 8:12:09 8:12:44 8:13:19 8:13:54 8:14:29 8:15:04 8:15:39 8:16:14 8:16:49 8:17:24 8:17:59 8:18:34 8:19:09 8:19:44 8:20:19 8:20:54 8:21:29 8:22:04 8:22:39 8:23:14 8:23:49 8:24:24 8:24:59 8:25:34 8:26:09 8:26:44 8:27:19 8:27:54 8:28:29 8:29:04 8:29:39 300 seconds 60 seconds 300 seconds 60 seconds 24 ResponseTime(ms) Time
  • 25. The effect of fs.xfs.xfssyncd_centisecs on PUT response time PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2 Seconds Avg- ResTime 60%-RT 80%-RT 90%-RT 95%-RT 99%-RT 100%-RT 10 83.26 30 50 350 520 700 1,620 30 43.34 30 40 50 60 530 3,690 60 38.81 30 40 50 70 270 5,900 300 31.89 30 40 50 70 220 9,530 Increasing of the fs.xfs.xfssyncd_centisecs improves the 99% percentile at the price of 100% percentile degradation 25
  • 27. • 2 proxy node (Proxy servers only) • 4 object nodes (Object servers) • 15 HDD • 128 GB RAM • 2 metadata nodes (Container and Account servers) • 2 SSD • 128 GB RAM • 2 clients machines connected to Proxy • Internal network connections are 10 Gbps Our Hardware - Story #2 27
  • 28. 28 Object PUT Workload 100 Workers, 100 Containers, 500K Objects
  • 30. 30 Clients Transmitted vs. Proxy Servers Received Throughput Comparison Proxy Servers Received Throughput Clients Transmitted Throughput = Total Client Transmitted Network Total Proxy Received Network Throughput(MB/sec) Time
  • 31. 31 Proxy Servers Received vs. Proxy Servers Transmitted Throughput Comparison Throughput(MB/sec) Time Proxy Server Received Throughput Proxy Server Transmitted Throughput X3 Proxy Servers Transmitted Throughput Proxy Servers Received Throughput
  • 32. 32 Proxy Servers Received and Transmitted vs. Object Servers Received Throughput Comparison Throughput(MB/sec) Time Proxy Server Transmitted Throughput Proxy Server Received Throughput Object Server Received Throughput X3 Proxy Servers Transmitted Throughput Proxy Servers Received Throughput Object Servers Received Throughput
  • 33. 33 Network vs. Disks Throughput Comparison Throughput(MB/sec) Time Proxy Server Transmitted Throughput Proxy Server Received Throughput Object Server Received Throughput Object Servers Disks Write Throughput Object Servers Disks Write Throughput Proxy Servers Received Throughput Object Servers Received Throughput X12
  • 35. Disks Capacity Utilization 35 Throughput(MB/sec) Time New object creation part Rewrite workload The expected disks capacity for all the workload
  • 36. Number of async_pending requests over the time 36 #async_pendingrequestspersec Time Object1 Object2 Object3 Object4
  • 37. 37 Variable object size workload 64 KB 128 KB 32 KB 15 KB 512 KB 1 MB Throughput(MB/sec) Time Proxy Server Received Throughput Proxy Server Transmitted Throughput Object Servers Disks Write Throughput
  • 38. 38 Disk vs. Client Perceived Bandwidth 64 KB 128 KB 32 KB 15 KB 512 KB 1 MB The overhead is not flat 3x, but instead is a function of object size Ratio
  • 39. • 2 proxy node (Proxy servers only) • 5 object nodes (Object servers) • 15 HDD • 128 GB RAM • 3 metadata nodes (Container and Account servers) • 3 SSD • 128 GB RAM • 2 clients machines connected to Proxy • Internal network connections are 10 Gbps Our Hardware - Story #3 – without async_pendings 39
  • 40. 40 Network vs. Disks Throughput Comparison Throughput(MB/sec) Time Proxy Server Received Throughput Proxy Server Transmitted Throughput Object Servers Disks Write Throughput
  • 42. 42 Number of async_pending requests over the time #async_pendingrequestspersec Time
  • 43. Back to Story #2 43
  • 44. 44 PUT Request Average Response Time (Lower is better) Object1 Object2 Object3 Object4 Proxy1 Proxy2 PUTRequestResponseTime(ms) Time Object Server “Object1” has much higher response time “Object2/3/4” have lower response times than proxies Proxies
  • 45. Disks Read and Write Throughputs 45 Object1.Read Object1.Write Object2.Read Object2.Write Object3.Read Object3.Write Object4.Read Object4.Write
  • 46. Processes statistics over object servers 46 #Process Wall Clock Time Blocked processes – processes that are waiting to IO response Running Blocked
  • 47. The idle CPU comparison over object servers (Higher is better) CPU0-CPU9 CPU10-CPU19 CPU20-CPU29 CPU30-CPU39 Stopped47 Percent(%) Wall Clock Time
  • 48. •Micro benchmark results (Vdbench 8k random write workload): • “Object1” shows an average response time of ~37 ms • “Object2”, “Object3”, “Object4” show an average response time of ~30 ms •Our investigation revealed that “Object1” server consists of older hardware components even though all servers were “supposed” to be the same 48
  • 49. 4 Object Servers 3 Object Servers – without Object1 ~10% Throughput improvement, although 25% object servers reduction 49
  • 50. Object Size Avg-Res Time Avg-Proc Time Throughput Bandwidth 15 KB 38.98 ms 38.94 ms 2854.31 op/s 42.81 MB/S 1 MB 105.37 ms 103.6 ms 967.04 op/s 967.04 MB/S 10 MB 852.06 ms 578.62 ms 117.43 op/s 1.17 GB/S Effect of Object Size on Cluster Bandwidth 50 Back of the envelope calculation: User Bandwidth * Number of Replicas < Total Proxy Backend Bandwidth At our case we have 3 proxy servers, and 10 Gbit network: User Bandwidth*3 < 3*10 Gbit  User Bandwidth < 1.25 GB/sec

Notes de l'éditeur

  1. StatsD allows deep instrumentation of the Swift code and can report 124 metrics across 15 swift daemons and the tempauth middleware.
  2. One day my boss came to me and asked: what is the cluster size that I need to get 1000 iops per second?
  3. One day my boss came to me and asked: what is the cluster size that I need to get 1000 iops per second?
  4. fs.xfs.xfssyncd_centisecs fs.xfs.filestream_centisecs
  5. Running and Blocked