IBM Spectrum Scale is a software-defined file storage solution that provides high performance and extreme scalability for file systems with sizes of tens of petabytes. It supports smart data placement policies to optimize utilization across multiple storage tiers from flash to disk to cloud to tape. Spectrum Scale is proven at over 4,000 customers worldwide for high performance computing, big data analytics, AI/ML workloads, and active archive stores.
2. IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without
notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not
be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can
be given that an individual user will achieve results similar to those stated here.
2
Please note
11. Shorten Time to Value with IBM Storage
INGEST INFERENCETRAININGCLASSIFY
AI Data Workflow
Champion
Challenger
80% of Data
Science Time
Resource
Optimization
Provision
Time
NEWDATA
AI Workflow
Why IBM?
Business Value
Data Scientist Productivity
Reduce Time to Accuracy, Improve Provisioning Time,
Increase Cycles, Reduce Human Error
• Improve velocity by getting to your data faster using tools,
not trial & error
The most scalable, low latency storage platform
Minimize data movement
Increase performance, automate storage processes and
reduce cost
• Using the leading portfolio of Software-defined storage
Optimized Economics
• Balance performance and cost with system choices
Proven Reference Architecture
• Higher performance, more confidence, lower costs
Industry Standard Approach
• Deliver consistency and efficiencies
Uses Technology advances
• GPU, Open Source Frameworks
Headwinds Challenge time-to-value
Lower CAPEX
Improve Model Quality
Faster Time to Insight
Business Agility
Lower OPEX
Higher Client Experience
Automation Savings
Look for dynamically adaptable, simple, flexible,
secure, cost-efficient, and elastic infrastructure that can
support high capacity along with high throughput and low
latency for high performance training and inferencing
experience.
IDC
28. 28 IBM Spectrum Scale
IBM ESS Shared-Storage Model vs Classic HDFS Shared-Nothing Cluster
10 GigE / 40 GigE
HDP Storage-Rich
Worker Nodes
HDP HDP HDP
Standard Shared-Nothing model on storage-rich servers
- Inefficient, inflexible, and expensive
- Expensive, wasteful, and with high OPEX to
scale and manage compute and storage
- Lacks enterprise features
• Disaggregated “thin” worker nodes with fewer disks
• No application-data disks in servers
• Replaced with shared storage
• No need for storage-only nodes
• Avoidance cluster sprawl with high
performance, flexibility, and enterprise features
• All with HDFS compatibility
30. EDW Optimization
Simplify data management using common storage between EDW and Hadoop
Archive Data away from EDW
- Move cold or rarely used data to Hadoop
as active archive
- Store more of data longer
Offload costly ETL process
- Free your EDW to perform high-value functions
like analytics & operations, not ETL
- Use Hadoop for advanced ETL
Optimize the value of your EDW
- Use Hadoop to refine new data sources, such as
web and machine data for new analytical context
Reduce migration effort & skillset gap
- Use existing investment in Oracle/DB2/Netezza
skills
- BigSQL allows you to migrate applications without
major code rewrites and additional SQL
development
Control cluster sprawl
- Grow storage independent of compute with ESS
- POWER servers deliver 1.7x throughput compared
to Hortonworks on x86
- Up-to 60% less storage footprint
Enterprise Data
Warehouse
DB2 / Dashdb / Oracle /
Netezza / Teradata …
Hot Data
Hortonworks
Hadoop
Cold Data, Archive Data,
New Sources
BigSQL SQL Interface
BI Software
(Business Analytics, Visualization like SAS grid, SAP HANA etc)
ESS for
Speed
ESS for
Data Lake
Spectrum
Scale
A Financial Services company in Europe is optimizing their DB2 warehouse using Hortonworks Hadoop; and is using
ESS as the common storage behind DB2 and Hadoop.
New Data Sources
Streaming / IOT data
32. Unified Analytics Workflows
Single data lake for Hadoop and non-Hadoop analytics
A bank in South Africa is implementing HDP and SAS grid software on a common ESS based infrastructure.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Other
Analytics
Platforms
SAS grid, SAP
HANA/Vora, ML/DL,
Conductor with Spark etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
All analytics workflows on common storage
- Improve data reliability and governance with single data
lake for Hadoop and non-Hadoop analytics setups
- Build ML/DL workflows that use multiple analytics
platforms
- Share data across analytics workflows as appropriate
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage gives
super fast ingest ability
Control cluster sprawl
- Grow storage independent of compute with ESS
- Up-to 60% less storage footprint
- POWER servers deliver 1.7x throughput compared to
Hortonworks on x86
34. Integrated HPC and Hadoop
Efficiently transform data into insights with single data lake for HPC & Hadoop
NASA and a Healthcare company from middle east are using common Spectrum Scale data lake to
efficiently get insights using traditional HPC and Hadoop analytics.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Traditional HPC
Open, Read, Write, MPI, C-code,
Python etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
NFS/SMB/Object
Interface
Spectrum Scale
Protocol Node
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
Extend HPC to add modern analytics
capabilities
- Efficient movement of data between modern and
traditional applications with common namespace
- Spectrum Scale in-place analytics capabilities
enable accessing the same data using
NFS/SMB/Object/POSIX/HDFS without requiring
any modifications to the data
- Improve data reliability and governance with single
data lake
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage
gives super fast ingest ability
- Common namespace enables running some edge
analytics at the ingest layer as well
Control cluster sprawl
- Grow storage independent of compute with ESS
- Up-to 60% less storage footprint
- POWER servers deliver 1.7x throughput compared
to Hortonworks on x86