Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Ceph on rdma

5 396 vues

Publié le

TCP vs RDMA implementation on Ceph

Publié dans : Logiciels
  • Identifiez-vous pour voir les commentaires

Ceph on rdma

  1. 1. Emerging Storage Solutions (EMS) SanDisk Confidential 1c CEPH Performance on XIO
  2. 2. Emerging Storage Solutions (EMS) SanDisk Confidential 2 Setup  4 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1
  3. 3. Emerging Storage Solutions (EMS) SanDisk Confidential 3 Result Transport IOPS BW % of read served from disk User%cpu Sys%cpu %idle TCP ~50K ~200M ~99 ~15 ~12 ~55 RDMA ~130K ~520M ~99 ~40 ~19 ~11 Summary: • ~1.5X performance gain • TCP iops/core = 2777, XIO iops/core = 3651
  4. 4. Emerging Storage Solutions (EMS) SanDisk Confidential 4 Setup  16 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1
  5. 5. Emerging Storage Solutions (EMS) SanDisk Confidential 5 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~118K ~470M ~99% ~3 ~26 ~16% RDMA ~120K ~480M ~99% ~7 ~25 ~28% Summary: • TCP is catching up; TCP iops/core = 3041, XIO iops/core = 3225 in cluster nodes • More memory consumed by XIO
  6. 6. Emerging Storage Solutions (EMS) SanDisk Confidential 6 Setup  16 OSDs, one per SSD (4TB)  2 hosts, 8 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~6 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1
  7. 7. Emerging Storage Solutions (EMS) SanDisk Confidential 7 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~175K ~700M ~99% ~8 ~18 ~16% RDMA ~238K ~952M ~99% ~14 ~20 ~28% Summary: • ~36% performance gain • TCP iops/core = 4755, XIO iops/core = 6918 in cluster nodes • More than 10% memory usage by RDMA
  8. 8. Emerging Storage Solutions (EMS) SanDisk Confidential 8 Setup  32 OSDs, one per SSD (4TB)  2 hosts, 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~6 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1,15:1,5:2
  9. 9. Emerging Storage Solutions (EMS) SanDisk Confidential 9 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~214K ~775M ~99% ~9 ~12 ~16% RDMA ~230K ~870M ~99% ~12 ~18 ~28% Summary: • TCP is catching up again, not much of gain • TCP iops/core = 2939, XIO iops/core = 3267 in cluster nodes • More emory usage per cluster node
  10. 10. Emerging Storage Solutions (EMS) SanDisk Confidential 10 Did some testing with more powerful setup  8 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 25:1
  11. 11. Emerging Storage Solutions (EMS) SanDisk Confidential 11 Result Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~148K ~505M ~99% ~15 ~68 ~11% RDMA ~166K ~665M ~99% ~18 ~73 ~19% Summary: • ~12% performance gain • TCP iops/core = 3109, XIO iops/core = 3616 in cluster nodes. • For client node, TCP iops/core = 8258, XIO iops/core = 10978 • More than 8% memory usage by RDMA
  12. 12. Emerging Storage Solutions (EMS) SanDisk Confidential 12 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~265K ~1037M ~0 ~35 ~40 ~11% RDMA ~276K ~1084M ~0 ~60 ~63 ~19% Summary: • Not much difference throughput wise • But, significant difference here.. TCP iops/core = 7280, XIO iops/core = 12,321 in cluster nodes • More than 8% memory usage by RDMA
  13. 13. Emerging Storage Solutions (EMS) SanDisk Confidential 13 Bumping up OSDs on the same setup  16 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  14. 14. Emerging Storage Solutions (EMS) SanDisk Confidential 14 Result Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~142K ~505M ~99% ~18 ~68 ~18% RDMA ~166K ~665M ~99% ~18 ~73 ~38% Summary: • TCP iops/core = 3092, XIO iops/core = 3614 in cluster nodes • TCP iops/core = 7924, XIO iops/core = 10978 • More than 2X memory usage by RDMA • No t much scaling between 8 and 16 OSDs for both TCP/RDMA !!! Nothing is saturated at this point.
  15. 15. Emerging Storage Solutions (EMS) SanDisk Confidential 15 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~268K ~1049M ~0 ~37 ~37 ~17% RDMA ~400K (when osd side portal thread = 2, client side = 8) ~1600M ~0 ~40 ~42 ~40% Summary: • Well, suspecting some lock contention in the OSD layer, started playing with xio portal threads • With less number of portal threads (2) in the OSD node, bumped up the no disk hit performance to 400K !! • I can see increasing XIO portal threads in OSD layer decreasing performance in this case • Tried with some shard options but TCP remains almost similar to 8 OSD case. Seems like this is a limit.
  16. 16. Emerging Storage Solutions (EMS) SanDisk Confidential 16 Checking the scale out nature  32OSDs, one per SSD (4TB)  2 nodes with 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  17. 17. Emerging Storage Solutions (EMS) SanDisk Confidential 17 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~323K ~1263M ~0 ~40 ~12 ~18.7% RDMA ~343K ~1339M ~0 ~55 ~30 ~37.5% Summary: • TCP is scaling but not XIO ! • In fact it is giving less throughput than 16 OSD setup ! • TCP iops/core = 4806, XIO iops/core = 6805 in cluster nodes • TCP iops/core = 6565, XIO iops/core=8750, even more significant in the client nodes • XIO mem usage per node is again ~2X
  18. 18. Emerging Storage Solutions (EMS) SanDisk Confidential 18 Result Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~249K ~973M ~99% ~22 ~18 ~15.5% RDMA ~258K ~1006M ~99% ~24 ~40 ~38% Summary: • TCP/XIO similar throughput • TCP iops/core = 5422, XIO iops/core=7678. Significant gain with XIO in client side • XIO mem usage per node is again more than 2X
  19. 19. Emerging Storage Solutions (EMS) SanDisk Confidential 19 Trying out bigger block sizes  32OSDs, one per SSD (4TB)  2 nodes with 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 1 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Couldn’t able to run 4 clients in parallel in case of XIO  Block size = 16K/64K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  20. 20. Emerging Storage Solutions (EMS) SanDisk Confidential 20 Result(32OSDS,16K,1client ) Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~150K ~2354M ~99% ~35 ~48 ~15.5% RDMA ~152K (spiky) ~2355M ~99% ~40 ~60 ~38% Summary: • TCP/XIO similar throughput • XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO. • But, cpu gain is visible
  21. 21. Emerging Storage Solutions (EMS) SanDisk Confidential 21 Result(32OSDS, 1 client) Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~53K ~3312M ~99% ~57 ~74 ~15.5% RDMA ~55K (but spiky) ~3625M ~99% ~57 ~82 ~39% Summary: • TCP/XIO similar throughput • XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO. • But, cpu gain is visible specially in client side
  22. 22. Emerging Storage Solutions (EMS) SanDisk Confidential 22 Summary  Highlights: – Definite improvement on iops/core – Single client is much more efficient with XIO messenger – Lower number of OSDs can give high throughput – If we can fix the internal XIO messenger contention, it has potential to outperform TCP in a big way  Lowlights: – TCP is catching up fast with increasing OSDs – TCP also scaling out well than XIO I guess – XIO present state is *unstable*, some crash/peering problem – Startup time for a connection is much higher for XIO – XIO connection is taking time to stabilize to a fix throughput – Memory requirement is considerably higher

×