SlideShare une entreprise Scribd logo
1  sur  10
Lustre+ZFS: Reliable/Scalable Storage




Josh Judd, CTO



  © 2012 WARP Mechanics Ltd. All Rights Reserved.
ZFS+Lustre: Open Storage Layers
       • ZFS:
                – Volume management layer (RAID)
                – Reliable storage (checksums)
                – Feature rich (snap, compression, replication, etc.)
                – Accelerators (SSD hybrid)
                – Scalable (E.g.16 exabytes for one file)
       • Lustre:
                – Linux-centric scale-out filesystem
                – Powers the world’s largest super computers
                – Single FS can be 10s of PB with TBs/sec of throughput
                – Can sit on top of ZFS
Page
       2
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
What does ZFS do for me?
       • HPC relevant features:
                – Support for 1/10GbE, 4/8/16GbFC, and 40Gb Infiniband
                – Multi-layer cache combines DRAM and SSDs with HDDs
                – Copy-on-write eliminates holes and accelerates writes
                – Checksums eliminate silent data corruption and bit rot
                – Snap, thin provisioning, compression, de-dupe, etc. built in
                – Lustre and SNFS integration allows 40GbE networking
       • Same software/hardware supports NAS and RAID
                – One management code base to control all storage platforms
       • Open storage. You can have the source code.


Page
       3
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
ZFS Feature Focus
       • Enhanced data integrity:
                – Legacy RAID is subject to silent corruption and holes
                – Little known fact: Virtually all RAIDs do not check parity on
                  each read – so if a bit flipped, you just get the wrong data!
                – ZFS adds a checksum to every block to solve this
       • Advanced cache:
                – Legacy RAID has insufficient cache to be meaningful
                – ZFS-based WARPraid supports 100s of GB of DRAM, plus 10s
                  of TBs of SSD
                – E.g., you can “push” directories onto SSD cache ahead of time
                  to drastically accelerate read-intensive workloads

Page
       4
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
What does Lustre do for me?
       • Category: Distributed (parallel-ish) filesystem
                – Similar(-ish) to StorNext, GPFS, pNFS, etc.
       • Allows performance and capacity to scale independently
       • One FS can be 10s of PB with TBs of throughput
       • No theoretical upper limit on architectural scalability
                – Yes, there are practical limits...
                – But those expand every year, and...
                – Even a “practical” Lustre FS can be 10s or 1000s of times
                  larger than other FSs on the market




Page
       5
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
How do they combine?
       • ZFS is currently supportable on Solaris. Lustre is
         currently supportable on Linux. So... How do they mix?
       • In theory, any of three ways:
                – Port Lustre to Solaris – has not been done
                – Port ZFS to Linux – in progress – replace MD/RAID and EXT4
                – Use ZFS on Solaris as a RAID controller under Lustre on Linux
       • WARP Mechanics focuses on the third option
                – Is supportable in production now
                – Maintains full separation of code
                – Still allows ZFS to replace EXT4 down the road, while
                  performing volume management on separate controllers –
                  which aids performance and scalability
Page
       6
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
Example Architecture
                                                                 Lustre Clients
                                                QDR/FDR IB or 10/40Gbps ENet

                                                                   IB or ENet


                OSS 1a                             OSS 1b                       OSS 2a             OSS 2b
                                                                      P2P


                   RAID 1a                          RAID 1b                       RAID 2a          RAID 2b

                                        ARC                                                  ARC
                         ZIL                             L2ARC                      ZIL            L2ARC
                               ~250TB HDDs                                            ~250TB HDDs

                                 ZFS RAID 1                                               ZFS RAID 2

Page
       7
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
Example Architecture
           Scale out example:
                           200 ZFS RAID systems =
                                             400 controllers = 1.2TBytes/sec
                                             16,000 NL-SAS HDDs = ~40PB usable
                                                                  Lustre Clients


                                                         Infiniband or High-Speed Ethernet


           OSS1a      OSS1b                                                                     [ ... ]



            R1a        R1b
                                                                                                [ ... ]
               N-001                        N-002               N-003               N-004                 N-200

                         ~200TB usable per Neutronium; estimated 3GB/sec per ctrl’er well-formed IO


Page
       8
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
Example Architecture


                                                                  Non-Lustre Clients



                                                                 IP (LAN/CAN/WAN)


           SMB          SMB            NFS           NFS                                 Lustre Clients


                                                           Infiniband or High-Speed Ethernet

                                                                                                          [ ... ]




                                                                                                          [ ... ]


                         ~200TB usable per Neutronium; estimated 3GB/sec per ctrl’er well-formed IO


Page
       9
       © 2012 WARP Mechanics Ltd. All Rights Reserved.
Practical WARP Implementation
       • The PetaPod Appliance:




Page
       10
       © 2012 WARP Mechanics Ltd. All Rights Reserved.

Contenu connexe

Tendances

Benefits of NexentaStor 3.0 in a Virtualized Enviroment
Benefits of NexentaStor 3.0 in a Virtualized EnviromentBenefits of NexentaStor 3.0 in a Virtualized Enviroment
Benefits of NexentaStor 3.0 in a Virtualized Enviromentcloudcampghent
 
RAID, Replication, and You
RAID, Replication, and YouRAID, Replication, and You
RAID, Replication, and YouGreat Wide Open
 
ZFS by PWR 2013
ZFS by PWR 2013ZFS by PWR 2013
ZFS by PWR 2013pwrsoft
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage SystemAmdocs
 
Feature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionFeature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionLF Events
 
JetStor NAS 724UXD Dual Controller Active-Active ZFS Based
JetStor NAS 724UXD Dual Controller Active-Active ZFS BasedJetStor NAS 724UXD Dual Controller Active-Active ZFS Based
JetStor NAS 724UXD Dual Controller Active-Active ZFS BasedGene Leyzarovich
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationxKinAnx
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 seriesxKinAnx
 
Brocade Administration & troubleshooting
Brocade Administration & troubleshootingBrocade Administration & troubleshooting
Brocade Administration & troubleshootingprakashjjaya
 
Sun storage tek 2500 series disk array technical presentation
Sun storage tek 2500 series disk array technical presentationSun storage tek 2500 series disk array technical presentation
Sun storage tek 2500 series disk array technical presentationxKinAnx
 
ZFS Workshop
ZFS WorkshopZFS Workshop
ZFS WorkshopAPNIC
 
Zfs Nuts And Bolts
Zfs Nuts And BoltsZfs Nuts And Bolts
Zfs Nuts And BoltsEric Sproul
 
Introduction to BTRFS and ZFS
Introduction to BTRFS and ZFSIntroduction to BTRFS and ZFS
Introduction to BTRFS and ZFSTsung-en Hsiao
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
San zoning in details
San zoning in detailsSan zoning in details
San zoning in detailsGaurav Rege
 
An Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusickAn Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusickeurobsdcon
 

Tendances (20)

Benefits of NexentaStor 3.0 in a Virtualized Enviroment
Benefits of NexentaStor 3.0 in a Virtualized EnviromentBenefits of NexentaStor 3.0 in a Virtualized Enviroment
Benefits of NexentaStor 3.0 in a Virtualized Enviroment
 
RAID, Replication, and You
RAID, Replication, and YouRAID, Replication, and You
RAID, Replication, and You
 
ZFS by PWR 2013
ZFS by PWR 2013ZFS by PWR 2013
ZFS by PWR 2013
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage System
 
Feature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionFeature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with Encryption
 
JetStor NAS 724UXD Dual Controller Active-Active ZFS Based
JetStor NAS 724UXD Dual Controller Active-Active ZFS BasedJetStor NAS 724UXD Dual Controller Active-Active ZFS Based
JetStor NAS 724UXD Dual Controller Active-Active ZFS Based
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentation
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 series
 
Brocade Administration & troubleshooting
Brocade Administration & troubleshootingBrocade Administration & troubleshooting
Brocade Administration & troubleshooting
 
ZFS
ZFSZFS
ZFS
 
Sun storage tek 2500 series disk array technical presentation
Sun storage tek 2500 series disk array technical presentationSun storage tek 2500 series disk array technical presentation
Sun storage tek 2500 series disk array technical presentation
 
ZFS Workshop
ZFS WorkshopZFS Workshop
ZFS Workshop
 
Zfs Nuts And Bolts
Zfs Nuts And BoltsZfs Nuts And Bolts
Zfs Nuts And Bolts
 
Introduction to BTRFS and ZFS
Introduction to BTRFS and ZFSIntroduction to BTRFS and ZFS
Introduction to BTRFS and ZFS
 
ZFS Talk Part 1
ZFS Talk Part 1ZFS Talk Part 1
ZFS Talk Part 1
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
San zoning in details
San zoning in detailsSan zoning in details
San zoning in details
 
ZFS
ZFSZFS
ZFS
 
Scale2014
Scale2014Scale2014
Scale2014
 
An Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusickAn Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusick
 

Similaire à Lustre+ZFS:Reliable/Scalable Storage

IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 serversTony Pearson
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central paJoseph D'Antoni
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanJoseph D'Antoni
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloadspittmantony
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail Internet World
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data DeduplicationRedWireServices
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfsRami Jebara
 
Arc Storage Intro Pdf
Arc Storage Intro PdfArc Storage Intro Pdf
Arc Storage Intro Pdfarcstorage
 
Sun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationSun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationxKinAnx
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkSisimon Soman
 
Top Technology Trends
Top Technology Trends Top Technology Trends
Top Technology Trends InnoTech
 
OSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel BeveridgeOSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel BeveridgeOpenStorageSummit
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomMicrosoft Singapore
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
The Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged EnvironmentsThe Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged EnvironmentsTony Pearson
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLYoshinori Matsunobu
 

Similaire à Lustre+ZFS:Reliable/Scalable Storage (20)

IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 servers
 
Zoned Storage
Zoned StorageZoned Storage
Zoned Storage
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloads
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
Arc Storage Intro Pdf
Arc Storage Intro PdfArc Storage Intro Pdf
Arc Storage Intro Pdf
 
Sun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationSun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentation
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talk
 
Top Technology Trends
Top Technology Trends Top Technology Trends
Top Technology Trends
 
OSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel BeveridgeOSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel Beveridge
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicom
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
The Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged EnvironmentsThe Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged Environments
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 

Lustre+ZFS:Reliable/Scalable Storage

  • 1. Lustre+ZFS: Reliable/Scalable Storage Josh Judd, CTO © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 2. ZFS+Lustre: Open Storage Layers • ZFS: – Volume management layer (RAID) – Reliable storage (checksums) – Feature rich (snap, compression, replication, etc.) – Accelerators (SSD hybrid) – Scalable (E.g.16 exabytes for one file) • Lustre: – Linux-centric scale-out filesystem – Powers the world’s largest super computers – Single FS can be 10s of PB with TBs/sec of throughput – Can sit on top of ZFS Page 2 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 3. What does ZFS do for me? • HPC relevant features: – Support for 1/10GbE, 4/8/16GbFC, and 40Gb Infiniband – Multi-layer cache combines DRAM and SSDs with HDDs – Copy-on-write eliminates holes and accelerates writes – Checksums eliminate silent data corruption and bit rot – Snap, thin provisioning, compression, de-dupe, etc. built in – Lustre and SNFS integration allows 40GbE networking • Same software/hardware supports NAS and RAID – One management code base to control all storage platforms • Open storage. You can have the source code. Page 3 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 4. ZFS Feature Focus • Enhanced data integrity: – Legacy RAID is subject to silent corruption and holes – Little known fact: Virtually all RAIDs do not check parity on each read – so if a bit flipped, you just get the wrong data! – ZFS adds a checksum to every block to solve this • Advanced cache: – Legacy RAID has insufficient cache to be meaningful – ZFS-based WARPraid supports 100s of GB of DRAM, plus 10s of TBs of SSD – E.g., you can “push” directories onto SSD cache ahead of time to drastically accelerate read-intensive workloads Page 4 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 5. What does Lustre do for me? • Category: Distributed (parallel-ish) filesystem – Similar(-ish) to StorNext, GPFS, pNFS, etc. • Allows performance and capacity to scale independently • One FS can be 10s of PB with TBs of throughput • No theoretical upper limit on architectural scalability – Yes, there are practical limits... – But those expand every year, and... – Even a “practical” Lustre FS can be 10s or 1000s of times larger than other FSs on the market Page 5 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 6. How do they combine? • ZFS is currently supportable on Solaris. Lustre is currently supportable on Linux. So... How do they mix? • In theory, any of three ways: – Port Lustre to Solaris – has not been done – Port ZFS to Linux – in progress – replace MD/RAID and EXT4 – Use ZFS on Solaris as a RAID controller under Lustre on Linux • WARP Mechanics focuses on the third option – Is supportable in production now – Maintains full separation of code – Still allows ZFS to replace EXT4 down the road, while performing volume management on separate controllers – which aids performance and scalability Page 6 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 7. Example Architecture Lustre Clients QDR/FDR IB or 10/40Gbps ENet IB or ENet OSS 1a OSS 1b OSS 2a OSS 2b P2P RAID 1a RAID 1b RAID 2a RAID 2b ARC ARC ZIL L2ARC ZIL L2ARC ~250TB HDDs ~250TB HDDs ZFS RAID 1 ZFS RAID 2 Page 7 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 8. Example Architecture Scale out example: 200 ZFS RAID systems = 400 controllers = 1.2TBytes/sec 16,000 NL-SAS HDDs = ~40PB usable Lustre Clients Infiniband or High-Speed Ethernet OSS1a OSS1b [ ... ] R1a R1b [ ... ] N-001 N-002 N-003 N-004 N-200 ~200TB usable per Neutronium; estimated 3GB/sec per ctrl’er well-formed IO Page 8 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 9. Example Architecture Non-Lustre Clients IP (LAN/CAN/WAN) SMB SMB NFS NFS Lustre Clients Infiniband or High-Speed Ethernet [ ... ] [ ... ] ~200TB usable per Neutronium; estimated 3GB/sec per ctrl’er well-formed IO Page 9 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 10. Practical WARP Implementation • The PetaPod Appliance: Page 10 © 2012 WARP Mechanics Ltd. All Rights Reserved.