Contenu connexe
Similaire à More Efficient Object Replication in OpenStack Summit Juno (20)
Plus de Kota Tsuyuzaki (11)
More Efficient Object Replication in OpenStack Summit Juno
- 1. Copyright©2014 NTT corp. All Rights Reserved.
Developing More Efficient Object
Replication on OpenStack Swift
2014/05/16 (OpenStack Juno Design Summit)
Kota Tsuyuzaki
Developer (Swift ATC)
Advanced Information Processing Technology SE Project
NTT Software Innovation Center
Copyright(c)2009-2014 NTT CORPORATION. All Rights Reserved.
- 2. 2Copyright©2014 NTT corp. All Rights Reserved.
1. Global Distributed Cluster
2. More Efficient Object Replication
3. Benchmark Analysis
Etherpad:
https://etherpad.openstack.org/p/juno_swift
_object_replication
Extra:
ssync issue
Outline
- 3. 3Copyright©2014 NTT corp. All Rights Reserved.
Demands:
• World Wide Services
• Capacity Optimization
• Disaster Recovery
Solution:
• Global Distributed Cluster
1. Global Distributed Cluster
- 4. 4Copyright©2014 NTT corp. All Rights Reserved.
Network Issues:
1. Global Distributed Cluster
・High Latency ・Narrow ・Expensive
tens of ~ 100 ms 1~10Gbps $15000/Gbps/mo
- 5. 5Copyright©2014 NTT corp. All Rights Reserved.
Network Issues:
1. Global Distributed Cluster
・High Latency Excellent
-> Regions
-> Affinity Controls
Region1 Region2
from SwiftStack Blog
https://swiftstack.com/blog/
- 6. 6Copyright©2014 NTT corp. All Rights Reserved.
Network Issues:
1. Global Distributed Cluster
・Narrow ・Expensive Not So Enough
-> ???
-> ???
• Large Amounts of Transfer
• Replication Delay
- 7. 7Copyright©2014 NTT corp. All Rights Reserved.
Objective:
Reducing The Amounts of
Replication Network Transfer
between Regions
(focus on Narrow Network)
2. More Efficient Object Replication
- 9. 9Copyright©2014 NTT corp. All Rights Reserved.
Current:
Model: 2 Regions 3 Replicas with Write Affinity
2. More Efficient Object Replication
Region1
Network between Regions
Region2
User
Internet
PUT object
Primary
Handoff
- 10. 10Copyright©2014 NTT corp. All Rights Reserved.
Current:
Model: 2 Regions 3 Replicas with Write Affinity
2. More Efficient Object Replication
Region1
Network between Regions
Region2
User
Internet
Primary
Handoff
Unfortunately Copy Twice or More
- 12. 12Copyright©2014 NTT corp. All Rights Reserved.
Approach:
• Only push to one remote based on affinity
• Request to sync to others from the remote
• Change only few codes in object-replicator and object-
server
2. More Efficient Object Replication
Region1
Network between Regions
Region2
Only push to one remote
Sync to others
- 13. 13Copyright©2014 NTT corp. All Rights Reserved.
2. More Efficient Object Replication
*Additional code[Object-Replicator]
find local part suffixes
for each:
find other primary locations
check remote
if not in remote:
if (remote region is local) or (remote region not in synced region):
push data
create remote suffix with request to sync in remote region
add remote region to synced region
[Object-Server (REPLICATE)]
create local suffix
if sync request in header:
push data to requested remotes
- 14. 14Copyright©2014 NTT corp. All Rights Reserved.
Objective:
• Analyze Replication Performance
• Total transferred data amount
• Average network bandwidth between region
• One pass time
3. Performance Analysis
- 15. 15Copyright©2014 NTT corp. All Rights Reserved.
Model:
• 2 Regions 3 Replicas
• 1 Gate Way Node(GW) between Regions
Scenario:
• Shaping GW Network as 1Gbps
• Stop object-replicator
• Load objects with Write Affinity
• 1Gbps -> 8MB * 5,000 (40GB total)
• Run object-replicator with once mode (32 concurrency)
Benchmark Patterns:
• Original (ssync)
• Proposed (ssync, rsync)
3. Benchmark Scenario
- 16. 16Copyright©2014 NTT corp. All Rights Reserved.
3. Benchmark Environment
Storage1 Storage2
Infiniband switch (LAN)
Region 1 Region 2
Proxy
x 36 x 36
Infiniband switch (LAN)
Storage3 Storage4
x 36 x 36
GW
20Gbps 20Gbps
20Gbps
(1G)
20Gbps 20Gbps
Client
Ethernet
Storage:
CPU: 2 * Intel X5650 2.67GHz (6 core * HT)
MEM: 48GB RAM
NIC: 20Gbps Infiniband
Disks: 3TB SATA (7,200 rpm) x 36 disks
GW:
CPU: 2 * Intel X5650 2.67GHz (6 core * HT)
MEM: 64GB RAM
NIC: 2 * 20Gbps Infiniband (Shaping 1G)
20Gbps
(1G)
- 17. 17Copyright©2014 NTT corp. All Rights Reserved.
3. Result (w/1Gbps shaping)
0
100
200
300
400
500
600
Original Proposed (ssync) Proposed (rsync)
elapsedtime(sec)
One Replication Pass Time (1Gps)
0
10
20
30
40
50
60
70
Original Proposed (ssync) Proposed (rsync)
TransferredDataAmount(GB)
Transferred Data on One Pass (1Gps)
0
0.2
0.4
0.6
0.8
1
Original Proposed (ssync) Proposed (rsync)
AverageNEtworkBandwidth
(Gbps)
Average Network Bandwidth (1Gps)
- Good Reduction in Transferred Data Amount
- Little decreasing appeared in Average
Network Bandwidth
- Good Reduction in One Pass Time
-- ssync is more efficient than rsync.
-- Proposed algorithm has small overhead with waiting node
syncing.
-- Enable to ensure sync all primary nodes with a shorter
time and smaller amount of data transfer.
Very Good!
Very Good!
Little decreasing40GB * 3 replica / 2 = 60GB
1 / 3 has 2 copy in region2
40 GB = theoretical value
- 18. 18Copyright©2014 NTT corp. All Rights Reserved.
1. Global Distributed Cluster
• Efficient Replication Needs
2. More Efficient Object Replication
• Affinity based approach
• Only push to one remote
3. Benchmark Analysis
• Good reduction of data transfer
• Little overhead in One Pass Time
acknowledgment:
Swiftstack members, Ken Igarachi, Yohei Hayashi, Takashi Shito, Hiromichi Ito, Naoto
Nishizono
Conclusion
- 19. 19Copyright©2014 NTT corp. All Rights Reserved.
• Is ensuring syncing all nodes needed?
• Request to sync at that time of replicate:
• Pros: Able to ensure to sync all replica
• Cons: Little overhead to wait syncing
• Not to request to sync, update the replica asynchronously:
• Pros: To be simple
• Cons: Unable to ensure to sync all replica
• Good way to sync other nodes in Object-Server
• Naïve (but very simple):
• Use object-replicator instance with unnecessary wasted
information. (e.g. Ring)
• Complex:
• Create syncing function or class for object-server
• Are there more efficient ways?
Discussions
current
current
- 21. 21Copyright©2014 NTT corp. All Rights Reserved.
Ssync:
• Replication process improvement based on HTTP
• Replacement of rsync (designed to be slimmer)
• Sender / Receiver Model
Issue:
• Performance of parallel i/o (might be) caused by evenlet
• Disable to access local disk in parallel (maybe, by constraint of
Python VM)
• Slower than rsync in my experiment
• Possible Solution:
• Launch sender as subprocess to allow using another CPU core for
disk read similar with rsync.
• When using os.fork(), performance became better to around same
as rsync.
Extra: Ssync issue