Apidays New York 2024 - The value of a flexible API Management solution for O...
Spectrum Scale final
1. IBM Spectrum Scale
Software defined storage for cloud, big data &
analytics, and NAS solutions
A Smarter Cloud
for a Smarter Planet
Joe Krotz
IBM CTS – Cloud and Storage Systems
August 2015
2. SOURCE: *2014 IBM Institute for Business Value Study on Infrastructure Matters; Gartner IT Metrics
The top two challenges organizations face with IT infrastructure are storage related –
Data Management and Cost Efficiency
2.5 Billion
Gigabytes of
data per day
Data
Explosion
90%
of data created in
last two years
Traditional Storage Models are being disrupted by the explosion of data
Data Innovation30% lower TCO with Flash
50% lower storage
management cost and
Hybrid delivery with
Software Defined Storage
Data
Innovation
0.4% overall IT budget growth
in 2015
670% more data
in 5 years for storage
administrators
Data
Economics
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
3. IBM Spectrum Scale – History and evolution
Software Defined Infrastructure
2006200520021998
HPC
GPFS (General
Parallel File
Systems)
General File
Serving
Standards
Portable
operating
system
interface
(POSIX)
semantics
-Large block
Directory and
Small file perf
Data
management
Virtual
Tape Server
(VTS)
Linux® Clusters
(Multiple
architectures)
IBM AIX® Loose
Clusters
GPFS 2.1-2.3
HPC
Research
Visualization
Digital Media
Seismic
Weather
exploration
Life sciences
32 bit /64 bit
Inter-op (IBM AIX
& Linux)
GPFS Multicluster
GPFS over wide
area networks
(WAN)
Large scale clusters
thousands of
nodes
GPFS 3.1-3.2
2009
First
called
GPFS
GPFS 3.4
Enhanced
Windows cluster
support
- Homogenous
Windows Server
Performance and
scaling
improvements
Enhanced
migration and
diagnostics
support
2010
GPFS 3.3
Restricted
Admin
Functions
Improved
installation
New license
model
Improved
snapshot and
backup
Improved ILM
policy engine
2012
Ease of
administration
Multiple-
networks/ RDMA
Distributed Token
Management
Windows 2008
Multiple NSD
servers
NFS v4 Support
Small file
performance
Information
lifecycle
management (ILM)
Storage Pools
File sets
Policy Engine
GPFS 3.5
Active File
Management
GPFS Native RAID
GPFS Shared
Nothing Cluster
(GPFS-FPO)
GPFS Storage
Server
Research in video streaming started in 1993, commercialized in 1994
GPFS 4.1+
Part of IBM
Spectrum Storage
Software Defined
Storage
GPFS 4.1
Encryption
Performance
(LROC, AFM)
Usability (Network
Monitor, NFS
migration, FPO)
Elastic Storage on
Linux on System z
Cloud Service on
IBM Softlayer
Elastic Storage
Server
2014 2015
Code
name
Elastic
Storage
IBM
Spectrum
Scale
GPFS 4.1.1
Enhanced Client
experience
New protocols:
NSFv4, SMB/CIFS,
improved
OpenStack Swift
Async DR
4. Spectrum Scale proven at over 3,000 customers worldwide
Software Defined Infrastructure
Climate and weather modeling with
with 16 Petabytes on line and
12 Petabytes archive on tape
4 time
Champion
Infiniti Red
Bull Racing
does real-
time race
analytics
Wind turbine design analysis
Done in hours instead of weeks
Private Cloud for digital media
enables global collaboration
for film productionPersonalized cancer treatment for
over 65,000 patients
R&D environment for natural
language tools
IBM.com/systems/storage/spectrum/
6. IBM Spectrum Scale
Security
• Native encryption and secure erase
• Disaster Recovery
Scalability and Snapshots
• Point in time copies
Performance
• Flash acceleration and local read only cache
• Fully integrated ILM
Usability
• New configuration guidance
• Installation toolkit will quickly install IBM Spectrum Scale
software – for Client, Server, FPO, and protocol nodes.
• Tight integration with IBM Spectrum Control for system health
monitoring
Data Security, Performance and Usability
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
7. Native Encryption and Secure Erase
• Native: encryption is built into the “Advanced” product
• Protects data from security breaches, unauthorized access, and being lost,
stolen or improperly discarded
• Cryptographic erase for fast, simple and secure file deletion
• Complies with NIST SP 800-131A and is FIPS 140-2 certified
• Supports HIPAA, Sarbanes-Oxley, EU and national data privacy law compliance
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
8. Native Encryption and Secure Erase
Encryption of data at rest
• Files are encrypted before they are stored on disk
• Keys are never written to disk
• No data leakage in case disks are stolen or improperly
decommissioned
Secure deletion
• Ability to destroy arbitrarily large subsets of a file
system
• No “digital shredding”, no overwriting: secure
deletion is a cryptographic operation
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
9. Data protection: Disaster recovery
The challenge: How do I recover data after a major disastrous event that could not be anticipated?
• Force majeure, e.g. earthquake or hurricane
• Accidents, e.g. fire or flood
• Administrator mistake
The Solution
• IBM Spectrum Scale lets you mirror your data at a secondary site
• Set your Recovery Point Objective (RPO) at say 15 mins, 30 mins, 1 hour, etc.
• If the primary site fails, data requests are automatically redirected to the secondary site
• Asynchronous updates accommodate unreliable networks
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
10. Scalability and Snapshots
IBM Spectrum Scale provides the functionality to
create snapshots at the file system, file set, and file
level. Each Spectrum Scale file system can have
multiple snapshots of any of the types at the same
time
Software Defined Infrastructure
The snapshot function allows a backup or mirror
program to run concurrently with user updates and
still obtain a consistent copy of the file system as of
the time that the snapshot was created.
IBM.com/systems/storage/spectrum/
Snapshot capacity
IBM Spectrum Scale V4.1 can retain 256 Global
snapshots and 256 Snapshots of each Independent
Fileset.
Spectrum Scale 4.1 can have 10,000 dependent
filesets and 1,000 independent file sets.
Scalability
Maximum number of files/file system 264
(9quintillion) files per file system
Maximum file system size 299 bytes
Maximum number of nodes 16,384
IBM Spectrum Scale is designed to meet the needs of
data-intensive applications such as engineering design,
digital media, data mining, relational databases,
financial analytics, seismic data processing, scientific
research and scalable file serving. The solution scales
up to more than a billion petabytes of data and
hundreds of GB/s throughput.
11. Flash Local Read Only Cache (LROC)
Clients
Spectrum Scale
Flash LROC SSDs
• Inexpensive SSDs or Flash placed directly in Client nodes
• Accelerates I/O performance up to 6x by reducing the amount of time CPUs
wait for data
• Also decreases the overall load on the network, benefitting performance
across the board
• Improves application performance while maintaining all the manageability
benefits of shared storage
• Cache consistency ensured by standard tokens
• Data is protected by checksum and verified on read
• Spectrum Scale handles the flash cache automatically so data is transparently
available to your application with very low latency and no code changes
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
12. LROC Flash Cache Example Speed Up
• Initially, with all data coming from the disk storage system, the client reads data from the SAS disks at ~
5,000 IOPS
• As more data is cached in Flash, client performance increases to 32,000 IOPS while reducing the load
on the disk subsystem by more than 95%
~ 5,000 IOPS 10K RPM SAS Drives
~ 32,000 IOPS Flash SSD
~ 6x
• Two consumer grade 200 GB SSDs cache a forty-eight 300 GB 10K SAS disk storage system
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
13. IBM FlashSystem
All Files
FlashSystem
as Cache
FlashSystem for Metadata StorageFlashSystem as storage tier
Performance: Using IBM Spectrum Scale with FlashSystem
IBM
FlashSystem
HDD Storage
Hot Files
FlashSystem is data center optimized
to deliver extreme performance,
flexible capacity
and total system protection
All other files
Data and metadata Data Metadata
Spectrum Scale
cluster:
Primary Storage
Spectrum Scale
cluster:
Primary Storage
IBM FlashSystem
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
14. Collaboration with Active File Management (AFM)
• AFM makes global namespace truly
global by automatically managing
asynchronous synchronization of data
• Only the modified contents are
synchronized from the primary to the
remote site
• Local caching: cached data access
performs much better than WAN
access
• Latencies are improved
• WAN link usage is reduced
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
16. 3 Deployment Options
Software Only
Software licenses: Express,
Standard or Advanced Editions
IBM Spectrum Scale SW, GUI,
GNR, drives, services
IBM Elastic Storage Server
Managed Service
IBM high performance services
for data
IBM Spectrum Scale
IBM.com/systems/storage/spectrum/
Software Defined Infrastructure
Spectrum Scale + SoftLayer Cloud
17. Spectrum Scale Parallel Architecture
Software Defined Infrastructure
Clients use data, Network Storage
Devices (NSDs) serve shared data
All NSD servers export to all clients in
active-active mode
Spectrum Scale stripes files across NSD
servers and NSDs in units of file-system
block-size
NSD client communicates with all the
servers
File-system load spread evenly across
all the servers and storage. No
HotSpots
Easy to scale file-system capacity and
performance while keeping the
architecture balanced
NSD Client does real-time parallel I/O to all the NSD
servers and storage volumes/NSDs
File stored in blocks
IBM.com/systems/storage/spectrum/
18. Spectrum Scale Cluster Models
Software Defined Infrastructure
Storage
Storage Storage
TCP/IP or Infinband RDMA Network
Storage Network
TCP/IP or Infiniband Network
TCP/IP or Infinband Network
NSD
Servers
Application
Nodes
Application
Nodes
IBM.com/systems/storage/spectrum/
19. Delivers Extreme Data
Integrity and Space Efficiency
• 2- and 3-fault-tolerant
erasure codes
• Up to 2PB per rack
• End-to-end checksum
• Protection against lost
writes
• Disk Hospital
• Proactively, detect,
diagnose and resolve
disk issues
Software Defined Infrastructure
Model GL6
2 servers, 6 Enclosures,
28U, 348 NL-SAS, 2 SSD
2, 4, or 6TB drives
12+ GB/sec
Breakthrough Performance
• High performance - less
hardware
• De-clustered RAID
reduces app load during
rebuilds
• Up to 3x lower overhead
to applications
• Built-in SSDs and
NVRAM for write
performance
• Fastest rebuild times
using De-clustered RAID
• Graphical disk failure
management
Lowers TCO
• 3 Years Maintenance
and Support
• General Purpose
Servers
• Off-the-shelf SBODs
• Standardized in-band
SES management
• Standard Linux
• Modular Upgrades
• Faster than
alternatives today –
and tomorrow!
IBM.com/systems/storage/spectrum/
IBM Elastic Storage Server
IBM Spectrum Scale bundled solution
20. Spectrum Scale Use Cases
Software Defined Infrastructure
Spectrum Scale shared storage
Cinder Swift
Hadoop
Connector
NFS
Single software defined storage solution across all these application types
Linear
capacity &
performance
scale out
POSIX
Enterprise class
storage using
standard
hardware
Single Name Space
NAS Big Data & Analytics Cloud
(Block) (Object)
File
SMB/CIFS
IBM.com/systems/storage/spectrum/
22. IBM Spectrum Scale benefits over other NAS solutions
Better performance Eliminate hotspots with massively parallel access to files
Sequential I/O with ES greater than 400 GB/s
Throughput advantage for parallel streaming workloads, e.g. Tech Computing and
Analytics
More Storage. More Files. Hyper Scale.
Simplified Management Easier management with one global namespace instead of managing islands of
NAS arrays, e.g. no need to copy data between compute clusters
Integrated policy driven automation
Fewer storage administrators required
Lower Cost Optimizes storage tiers including flash, disk and tape
Increased efficiency and more efficient provisioning due to parallelization and
striping technology
Remove duplicate copies of data, e.g. run analytics on one copy of data without
having to set up a separate silo
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
23. Data Access: IBM Spectrum Scale protocol support
• The IBM Spectrum Scale Protocol Node allows access to data stored in a Spectrum Scale filesystem, using
additional access methods and protocols.
• The Protocol Node functions are clustered and can support transparent failover for NFS and SWIFT protocols as
well as SMB protocols.
• Multiprotocol data access from other systems using the following protocols
• NFS v3 and v4
• SMB 2 and SMB 3.0 mandatory features / CIFS for
Windows support
• SMB support is delivered by Samba 4.2.
• 3,000 active connections per node / 20K max
• OpenStack Swift and S3 API support for object storage.
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
SWIFT
NFS
CIFS
24. Administrator
Command Line Interface
Users
NFS
SMB/CIFS
POSIX
Open Stack Swift
PN1
Protocol
Node
Flash
Disk
Tape
ExternalTCP/IPorIBNetwork
PN2
PNn
NSD1
Network Shared
Disks
NSD2
NSDn
…
Physical Storage
Data Access: Protocol Support
IBMSpectrumScaleClusterTCP/IPorIBNetwork
Mgmt Nodes
Authentication
Services
keystone
Open Stack Cinder
SpectrumScaleClusterNodes
Elastic
Storage
Server
IBM.com/systems/storage/spectrum/
Software Defined Infrastructure
New GUI coming in October!
26. Spectrum Scale: Drop-in Replacement for HDFS
Adding Analytics without adding a dedicated Analytics infrastructure
• Hadoop connector
• Supports IBM Big Insights Analytics and open
source Apache Hadoop
• Existing infrastructure can do Hadoop-based
Analytics
• No need to purchase a dedicated Analytics infrastructure, lowering CAPEX and
OPEX
• No need to move data in and out of an Analytics
dedicated silo
• Software defined infrastructure for multi-tenancy
• Enterprise-class protection and efficiency
‒ Full data lifecycle management
‒ Policy based tiering from flash to disk to tape
• Reduce cost, simplify management
Compute Cluster
Spectrum Scale
HDFS
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
27. IBM Spectrum Scale: Hybrid storage for
Hadoop Applications
Shared Storage Pools Shared Nothing Cluster Pool
DiskFlash
Spectrum Scale client
Spectrum Scale Hadoop Connector
Hadoop File System API
Hadoop Application • Exploit locality for the files stored in the local storage
• Access shared storage thru the same connector.
• Storage is completely transparent to the application
• Scale storage independent of compute nodes
• The IBM Spectrum Scale Hadoop connector
has been extended to support shared
storage that includes SAN Based storage,
shared nothing cluster configurations, and
integrated solutions like ESS.
• Full Hadoop interfaces for Map/Reduce
analytics processing.
• No transfer or ingest required as the data is
already there
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
29. • OpenStack Havana release includes a Cinder driver
• Giving architects access to the features and capabilities of the industry’s leading enterprise scale-out software
defined storage
• With OpenStack on Spectrum Scale, all nodes see all data
• Copying data between services, like Glance to Cinder is minimized
or eliminated
• Speeding instance creation and conserving storage space
• Rich set of data management and information lifecycle features
• Efficient file clones
• Policy based automation optimizing data placement for locality or performance tier
• Backup
• Industrial strength reliability, minimizing risk
• Cinder driver provides resilient block storage, minimal data copying
between services, speedy instance creation and efficient space utilization
Spectrum Scale OpenStack Cinder Driver
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
30. IBM Spectrum Scale and OpenStack Swift
• Consolidate File and Object under a
single shared storage
infrastructure.
• The new IBM Spectrum Scale Protocol Node lets
you share the storage infrastructure for both Files
and Objects
• Running your object store on IBM Spectrum Scale
provides these key features:
• POSIX/NFS/SMB/Object in single storage cluster with a
single authentication scheme
• Extra layers of data protection through Snapshots,
Backup, and/or Disaster Recovery
• Integrated ILM tiering to move cold objects to low cost
tier and off premise
• Encryption of data at rest and Secure Erase
• Additional data protection ESS solution
IBM Spectrum Scale
NFS
SMBPOSIX
SSD Fast
Disk
Slow
Disk
Tape
Swift
HDFS
Cinder
Glance Manila
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
31. OpenStack and IBM Spectrum Scale help clients manage data at scale
Business Needs IBM Spectrum Scale
Business: I need virtually unlimited storage An open & scalable cloud platform
Operations: I need a flexible infrastructure that
supports both object and file based storage
A single data plane that supports Cinder,
Glance, Swift, Manila as well as NFS, et. al.
Operations: I need to minimize the time it takes
to perform common storage management tasks
A fully automated policy based data placement
and migration tool
Collaboration: I need to share data between
people, departments and sites with low
latency.
Sharing with a variety of WAN caching modes
IBM.com/systems/storage/spectrum/
Software Defined Infrastructure
32. Data Center and Point of Presence
New Data Centers in 2014
Network Point of Presence
100,000+
Servers
21,000
Customers
20,000,000
Active
Domains
•IPv4/IPv6 dual stack
•Global DNS
•Global DDOS Mitigation
•Global Internet Exchanges &
Peering
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
33. Infrastructure solution with a common management interface and API across a unified architecture
Mix and match bare metal servers, virtual server instances, and hosted private clouds
Full integration with all IBM storage portfolio offerings
Full OpenStack, RESTful API, SmartCloud Storage and IBM Storage Integration Server integration
Seamless scaling for Cloud and large deployments. This include Public, Private and Hybrid solutions
Bare metal with
your own stack
Dedicated virtualized
environment
Shared virtual
environment
Dedicated virtualized
environment
Triple Network Architecture
Automation & Support
Delivers Outstanding Performance & Price
Flexibility to Deliver Dynamic/ Hybrid Capability
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
35. 35
What is LTFS?
1) Open Format for data which is written to tape
Developed and disclosed by IBM
Describes the format of data and meta data stored on tape
Meta data is based on XML schema
Applicable to LTO5, LTO6 and TS1140
Requires tape partitioning
2) File System support (code) to R/W tapes in LTFS format
Externalizes the LTO5 / LTO6 / TS1140 tape as file system
Enables standard applications to write/read LTFS tapes
Supports update, edit, and delete of files on LTFS tape
Supports partial recall
Available on Linux, Mac OS X and Windows
• Makes tape look and work like any removable media (e.g., USB drive, removable disk)
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
36. 36
LTFS Mount point is the library
Cartridges are subdirectories
LTFS mounts cartridges into drive to service file access requests
Easy usage, no ISV required
Caching of tape indices in memory
For searching and displaying tape contents without needing a mount
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
37. Data Ingestion
or creation
Data Processing Access Archival
High Performance Tier
Flash, SSD, SAS
Parallel Access
Provide highest performance for
most demanding applications
High volume storage
Single Global Name Space across
all tiers
Lower costs by allocating the
right tier of storage to the right
need
Archival storage with low
cost disk or tape
Integration with Spectrum Protect
and Spectrum Archive
Policy based Archival and remote
Disaster Recovery
Manage the full data life cycle cost-effectively through policy driven IML
Software Defined Infrastructure
IBM.com/systems/storage/spectrum/
38. The Solution: IBM Spectrum Scale brings it all together
Global Name Space
IBM Spectrum Scale replaces
SAN-based file systems
Replaces NTFS, EXT4, JFS2 and other
POSIX file systems
Used by over 200 of the top 500
supercomputers
No file transfers required between
different OS
Can be used with everything from
databases to video streaming
For x86, POWER and
z System servers
Secure with
Data-at-rest encryption
IBM Spectrum Scale replaces HDFS and NAS file storage
Full Hadoop interfaces for Map/Reduce analytics processing
No transfer or ingest required as the data is already there
Fully protected with Backup Software
File-level access support for NFS, CIFS, FTP, SCP and HTTPS
Supports Enterprise File Sync-and-Share
via OwnCloud or Funambol
IBM Spectrum Scale
offers Object access
Object-level access based on
OpenStack Swift driver and
Amazon S3 APIs
IBM Spectrum Scale
supports all media and
integrates seamlessly
with LTFS
Spans flash, disk and tape
media with a file system view
that
IBM.com/systems/storage/spectrum/
Software Defined Infrastructure