Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Red Hat Ceph Storage Acceleration Utilizing Flash Technology

988 vues

Publié le

Rick Stehno, Seagate

Publié dans : Technologie
  • Soyez le premier à commenter

Red Hat Ceph Storage Acceleration Utilizing Flash Technology

  1. 1. RED HAT CEPH STORAGE ACCELERATION UTILIZING FLASH TECHNOLOGY Applications and Ecosystem Solutions Development Rick Stehno Red Hat Storage Day - Boston 2016 1
  2. 2. Seagate Confidential 2 • Utilize flash caching features to accelerate critical data. Caching methods can be write-back for writes, write-thru for disk/cache transparency, read cache, etc.. • Utilize storage tiering capabilities. Performance critical data resides on flash storage, colder data resides on HDD • Utilize all flash storage to accelerate performance when all application data is performance critical or when the application does not provide the features or capabilities to cache or to migrate the data Three ways to accelerate application performance with flash Flash Acceleration for Applications
  3. 3. Seagate Confidential 3 Configurations: • All flash storage - Performance • Highest performance per node • Less maximum capacity per node • Hybrid HDD and flash storage - Balanced • Balances performance, capacity and cost • Application and workload suitable for • Performance critical data on flash • Utilize host software caching or tiering on flash • All HDD storage - Capacity • Maximum capacity per node, lowest cost • Lower performance per node Ceph Software Defined Storage (SDS) Acceleration
  4. 4. Seagate Confidential 4 –Higher performance in half the rack space –28% less power and cooling –Lower MTBF inherent with reduced component count –Reduced OSD recovery time per Ceph node –Lower TCO Why 1U server with 10 NVMe SSDs may be better choice vs. 2U Server with 24 SATA SSDs Storage - NVMe vs SATA SSD
  5. 5. Seagate Confidential 5 • 4.5x increase for 128k sequential reads • 3.5x increase for 128k sequential writes • 3.7x increase for 4k random reads • 1.4x increase for 4k random 70/30 RR/RW • Equal performance for 4k random writes Why 1U server with 10 NVMe SSDs may be better choice vs. 2U Server with 24 SATA SSDs All Flash Storage - NVMe vs SATA SSD cont’d FIO Benchmarks
  6. 6. Seagate Confidential 6 Why 1U server with 10 NVMe SSDs may be better choice vs. 2U Server with 24 SATA SSDs All Flash Storage - NVMe vs SATA SSD cont’d Increasing the load to stress NVMe capabilities over and above the 128 thread SATA SSD Test: • 5.8x increase for Random Writes at 512 threads • 3.1x increase for 70/30 RR/RW at 512 threads • 4.2x increase for Random Reads at 790 threads • 8.2x increase for Sequential Reads at 1264 threads 3x 5.8x 1.4x 3.1x 1.0x 4.2x 1.3x 8.2x 128 Theads 512 Theads 128 Threads 512 Threads 128 threads 790 threads 128 threads 1264 threads Gains Random Write 70/30 RR/RW Random Reads Sequential Reads Ceph RBD NVMe Performance Gains over SATA SSD Random Writes 70/30 RR/RW Random Reads Sequential Reads 128k FIO RBD IOEngine Benchmark
  7. 7. Seagate Confidential 7 Price per MB/s: Cost of ((Retail Cost of SSD) / MB/s for each test) SSD Total SSD Price Price MB/s 128k Random Writes 128 threads Price MB/s 128k Random Writes 512 threads 24 - SATA SSD 960G $7,896 24 - SATA SSD 960G $15.00 10 - NVMe 2TB $10,990 10 - NVMe 2TB $7.00 10 – NVMe 2TB $3.00 These prices do not include savings from electrical/cooling costs, reducing datacenter floor space, from the reduction of SATA SSD Note: 128k random write FIO RBD benchmark: SATA SSD averaged 85% busy, NVMe averaged 80% busy with 512 threads FIO RBD Maximum Threads Random Write Performance for NVMe Ceph Storage Costs Seagate SATA SSD vs. Seagate NVMe SSD
  8. 8. Seagate Confidential 8 MySQL • MySQL is the most popular and the most widely used open-source database in the world • MySQL is both feature rich in the areas of performance, scalability and reliability • Database users demand high OLTP performance - Small random reads/writes Ceph • Most popular Software Defined Storage system • Scalable • Reliable Does it make sense implementing Ceph into a MySQL Database environment? But Ceph was not designed to provide high performance for OLTP environments OLTP entails small random reads/writes
  9. 9. Seagate Confidential 9 MySQL Setup: Release 5.7 45,000,000 rows 6GB Buffer 4G logfiles RAID 0 over 18 HDD Ceph Setup: 3 Nodes each containing: Jewel Using Filestore 4 NVMe SSDs 1 Pool over 12 NVMe SSDs Replica 2 40G private and public network For all tests, all MySQL files were local on local server except the database file, this file was moved to the Ceph cluster. MySQL - Comparing Local HDD to Ceph Cluster
  10. 10. Seagate Confidential 10Seagate Confidential MySQL - Comparing Local HDD to Ceph Cluster MySQL - Comparing Local HDD to Ceph Cluster
  11. 11. Seagate Confidential 11 MySQL - Comparing Local NVMe SSD to Ceph Cluster MySQL Setup: Release 5.7 45,000,000 rows 6GB Buffer 4G logfiles RAID 0 over 4 NVMe SSDs Ceph Setup: 3 Nodes each containing: Jewel Using Filestore 4 NVMe SSDs 1 Pool over 12 NVMe SSDs Replica 1 40G private and public network For all tests, all MySQL files were local on local server except the database file, this file was moved to the Ceph cluster.
  12. 12. Seagate Confidential 12 All SSD Case-1: Case-2: Case-3: 2 SSDs 2 SSDs 1 PCIe flash 1 OSD/SSD 4 OSDs/SSD 4 OSDs/SSD 8 OSD journals on PCIe flash 0 100000 200000 300000 400000 500000 600000 700000 800000 0 200000 400000 600000 800000 1000000 1200000 2 ssd, 2 osd 2 ssd, 8 osd 2 ssd, 8 osd, +journal IOPS KB/s FIO Random Write - 200 Threads - 128k Data Seagate SSD and Seagate PCIe Storage using AIC server Ceph All Flash Storage Acceleration
  13. 13. Seagate Confidential 13 • Use RAW device or create 1st partition on 1M boundary (sector 2048) • Ceph-deploy uses the optimal alignment when creating an OSD • Use blk-mq/scsi-mq if kernel supports it • rq_affinity = 1 for NVMe, rq_affinity = 2 for non-NVMe • rotational = 0 • blockdev --setra 4096 Linux tuning is still a requirement to get optimum performance out of a SSD Linux Flash Storage Tuning
  14. 14. Seagate Confidential 14 • If using an older kernel that doesn’t support BLK-MQ, use: • “deadline” IO-Scheduler with supporting variables: • fifo-batch • front-merges • writes-starved • XFS Mount options: • nobarrier,discard,noatime,attr2,delaylog,inode64,noquota • If using a smaller number of SSD/NVMe SSD, test with creating multiple OSD’s per SSD/NVMe SSD. Have seen good performance increases using 4 OSD per SSD/NVMe SSD • MySQL – when using flash, configure both innodb_io_capacity and innodb_lru_scan_depth Linux tuning is still a requirement to get optimum performance out of a SSD Linux Flash Storage Tuning cont’d
  15. 15. Seagate Confidential 15 Flash Storage Device Configuration If the NVMe SSD or SAS/SATA SSD device can be configured to use a 4k sector size, this could increase performance for certain applications like databases. For all of my FIO tests with the RBD engine and for all of my MySQL tests, I saw up to a 2x improvement (depending on the test) when using 4k sector sizes compared to using 512 byte sectors. Storage devices used for all of the above benchmarks/tests: • Seagate Nytro XF1440 NVMe SSD • Seagate Nytro XF1230 SATA SSD • Seagate 1200.2 SAS SSD • Seagate XP6500 PCIe Flash Accelerator Card
  16. 16. Seagate Confidential 16 Seagate Broadest PCIe, SAS and SATA Portfolio
  17. 17. Seagate Confidential 17Seagate Confidential Thank You! Questions? Learn how Seagate accelerates storage with one of the broadest SSD and Flash portfolios in the market

×