Unifying the Silos: Optimize your Data Pipeline for Analytics and AI

IBM Storage and SDI
© Copyright IBM Corporation 2018
Unifying the Silos :
Optimize your data pipeline for Analytics and AI
Gary Tomchuk
IBM Global SW Defined Storage Sales
Benoit Granier
IBM File and Object Systems Technical Manager
for Europe

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without
notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not
be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can
be given that an individual user will achieve results similar to those stated here.
2
Please note

Notices and disclaimers
3Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation
© 2018 International Business Machines Corporation. No part of this
document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to
products that have not yet been announced by IBM) has been reviewed
for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this
information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and
services are warranted per the terms and conditions of the agreements
under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product
plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a
controlled, isolated environments. Customer examples are presented as
illustrations of how those customers have used IBM products and the
results they may have achieved. Actual performance, cost, savings or
other results in other operating environments may vary.
References in this document to IBM products, programs, or services
does not imply that IBM intends to make such products, programs or
services available in all countries in which IBM operates or does
business.
Workshops, sessions and associated materials may have been prepared
by independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for
informational purposes only, and are neither intended to, nor shall
constitute legal or other guidance or advice to any individual participant
or their specific situation.
It is the customer’s responsibility to insure its own compliance
with legal requirements and to obtain advice of competent legal counsel
as to the identification and interpretation of any relevant laws and
regulatory requirements that may affect the customer’s business and
any actions the customer may need to take to comply with such
laws. IBM does not provide legal advice or represent or warrant that its
services or products will ensure that the customer follows any law.

Notices and disclaimers
continued
4Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed
to the suppliers of those products. IBM does not warrant the quality of
any third-party products, or the ability of any such third-party products
to interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM
products and services used in the presentation] are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at “Copyright and trademark
information” at: www.ibm.com/legal/copytrade.shtml.

IBMStorageandSDI
Agenda
§ Data Management Challenges in Analytics and AI
§ AI Data Pipeline with IBM Spectrum Storage
§ IBM Spectrum Storage offering for Analytics and AI
§ IBM Spectrum Scale
§ IBM Spectrum Discover
§ IBM Cloud Object Storage
§ Data Unification using IBM Spectrum Scale with HDP
§ Data Unification Use Cases
§ IBM Spectrum Storage for AI - Solutions
5

IBM Storage and SDI
Data Management Challenges in
Analytics and AI

IBMStorageandSDI
Biggest Unstructured Data Challenges
Source: Forrester Analytics, Global Business Technographics Data And Analytics Survey, 2017,
Global Business Technographics Data And Analytics Survey, 2016 (Enterprises with 1000+ employees)
of firms see sourcing,
gathering, managing &
governing data as their
biggest challenges
when using systems of insight
39%
Number of enterprises
with 1,000 TB+
unstructured data
stores grew
from 2016
to 2017
3X

IBMStorageandSDI
Data Management Challenges
§ Silos of infrastructure for various analytics use cases
§ Multiple copies of the same data without a single source of truth
§ Analytics on the stale data
§ Time consuming data ingest cycle
§ Unmanageable cluster sprawl with data growth

IBM Storage and SDI
AI Data Pipeline for IBM Spectrum
Storage

© IBM Corporation 2018 10
AI, Analytics
and Data
Pipelines
AI and Big Data pipelines need to support high
performance Data Analytics and AI/Machine
Learning /Deep Learning from early
experimentation to shared data services on
production clusters
POWERAI

Shorten Time to Value with IBM Storage
INGEST INFERENCETRAININGCLASSIFY
AI Data Workflow
Champion
Challenger
80% of Data
Science Time
Resource
Optimization
Provision
Time
NEWDATA
AI Workflow
Why IBM?
Business Value
Data Scientist Productivity
Reduce Time to Accuracy, Improve Provisioning Time,
Increase Cycles, Reduce Human Error
• Improve velocity by getting to your data faster using tools,
not trial & error
The most scalable, low latency storage platform
Minimize data movement
Increase performance, automate storage processes and
reduce cost
• Using the leading portfolio of Software-defined storage
Optimized Economics
• Balance performance and cost with system choices
Proven Reference Architecture
• Higher performance, more confidence, lower costs
Industry Standard Approach
• Deliver consistency and efficiencies
Uses Technology advances
• GPU, Open Source Frameworks
Headwinds Challenge time-to-value
Lower CAPEX
Improve Model Quality
Faster Time to Insight
Business Agility
Lower OPEX
Higher Client Experience
Automation Savings
Look for dynamically adaptable, simple, flexible,
secure, cost-efficient, and elastic infrastructure that can
support high capacity along with high throughput and low
latency for high performance training and inferencing
experience.
IDC

IBMStorageandSDI
The Goal: Move Data from Ingest to Insights
INSIGHTSCLASSIFY / TRANSFORM ANALYZE / TRAININGESTEDGE

IBMStorageandSDI
Trained Model
SSD/NVMe
ML / DL
Prep Training Inference
IBM AI Data Pipeline
Throughput-oriented,
software defined
temporary landing
zone
High throughput
performance tier
Transient Storage
Global Ingest
Fast Ingest /
Real-time Analytics Archive
Classification &
Metadata Tagging
SSD
SDS/Cloud
Cloud Hybrid/HDD
INSIGHTSANALYZE / TRAININGEST
Insights Out
High scalability, large/sequential I/O capacity tier
EDGE CLASSIFY / TRANSFORM
TapeHDD Cloud
High volume, index &
auto-tagging zone
performance &
capacity tier
globally accessible
capacity tier
High throughput, low
latency, random I/O
performance tier
ETL
Data In
High throughput, random
I/O, performance &
capacity Tier
Hadoop / Spark
Data Lakes
SSD/Hybrid
Inference

IBMStorageandSDI
IBM AI Data Pipeline with IBM Spectrum Storage
Improved data governance with storage offerings for end-to-end data pipeline
Spectrum Scale
Cloud Object
Storage
Cloud Object
Storage
Elastic
Storage Server
Elastic
Storage Server
Elastic
Storage Server
Transient Storage
Global Ingest
Fast Ingest /
Real-time Analytics Archive
Spectrum
Archive
Hadoop / Spark
Data Lakes
Data In
Insights Out
INSIGHTSANALYZE / TRAININGESTEDGE CLASSIFY / TRANSFORM
SSD
SDS/Cloud
Cloud
SSD/Hybrid
Hybrid/HDD
TapeHDD Cloud
Trained Model
SSD/NVMe
ML / DL
Prep Training Inference
Spectrum Discover Elastic
Storage Server
Cloud Object
Storage
Elastic
Storage Server
ETL
Classification &
Metadata Tagging
Inference

IBM Storage and SDI
IBM Spectrum Storage Offerings for
Analytics and AI

IBMStorageandSDI
Delivers Data Management at scale for
enterprises that are swamped by data
IBM Spectrum Scale
Lets you grow and share the storage infrastructure
while automatically moving file and object data to the
optimal storage tier as quickly as possible.
IBM Spectrum Scale
Store Everywhere. Run Anywhere.

© 2018 IBM Corporation© Copyright IBM Corporation 2018
IBM Spectrum Scale – Data Management at Scale
Spectrum Scale
Encryption and
Compression
NFS SMBFile ObjectHDFS
Distributed RAID
• Software defined file storage with high performance
and extreme scalability
• 50% of systems delivering top Spec-SFS benchmarks
run IBM Spectrum Scale SW.
• Supports file systems with sizes of tens of petabytes
that contain billions of files and can be accessed by
thousands of nodes in a cluster.
• Smart policy engine to optimize utilization with
multiple storage tiers
Flash->Disk->Cloud->Tape
• Enterprise class storage features like Disaster
recovery, Encryption, Compression, Erasure Coding
• Flexibility in storage architectures shared-nothing,
shared-storage or hybrid.
Fast
Disk
Slow
Disk
TapeSSD Fast
Disk
Slow
Disk
IBM Spectrum Scale – Data Management at Scale

© 2018 IBM Corporation18
IBM Spectrum Scale
Proven at over
4,000 customers
worldwide
Most common use-
cases:
- High performance computing
- Big data workloads like
Hadoop, Spark
- Enterprise analytics workloads
like SAS grid, SAP HANA
- AI/ML/DL like genomics,
autonomous driving
- High performance active
archive stores
4 time Champion Infiniti Red Bull Racing
does real-time race analytics
Personalized cancer treatment
for over 65,000 patients
Climate and weather modeling with
16 PB on line & 12 PB archive on tape
R&D environment for
natural language tools
Semiconductor Design
Higher profits from
shorter chip design cycles
Shared storage for global banking
100 times faster than incumbent solution

IBMStorageandSDI
IBM Spectrum Scale Storage
…for the world’s most powerful supercomputers
Summit System
• 4608 nodes, each with:
• 2 IBM Power9 processors
• 6 Nvidia Tesla V100 GPUs
• 608 GB of fast memory
• 1.6 TB of NVMe memory
• 200 petaflops peak
performance for modeling
and simulation
• 3.3 ExaOps peak
performance for data
analytics and AI
IBM Spectrum Scale
IBM Elastic Storage
Server
2.5 TB/sec throughput
to storage architecture
250 PB HDD storage
capacity
Sierra System
• 4320 nodes, each with
• 2 IBM Power9 processors
• 4 Nvidia V100 GPUs
• 320 GB of node memory
• 1.6 TB of NVMe memory
• IBM Spectrum Scale
• IBM Elastic Storage Server
125 petaflops peak performance
154 PB HDD storage capacity
World’s most
powerful
supercomputer World #2
supercomputer

IBMStorageandSDI
IBM Elastic Storage Server (ESS)
Integrated scale-out data management for file and object data
Optimal building block for high-performance, scalable,
reliable enterprise Spectrum Scale storage
• Faster data access with choice to scale-up or out
• Easy to deploy clusters with unified system GUI
• Simplified storage administration with IBM Spectrum Control integration
One solution for all your Spectrum Scale data needs
• Single repository of data with unified file and object support
• Anywhere access with multi-protocol support:
NFS 4.0, SMB, OpenStack Swift, Cinder, and Manila
• Ideal for Big Data Analytics with full Hadoop transparency
Ready for business critical data
• Disaster recovery with synchronous or asynchronous replication
• Ensure reliability and fast rebuild times using Spectrum Scale RAID’s
dispersed data and erasure code
• Five 99999s of availability
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage

IBMStorageandSDI
IBM Elastic Storage Server (ESS) Family
Model GL4S:
4 Enclosures, 20U
334 NL-SAS, 2 SSD
Model GL6S:
6 Enclosures, 28U
502 NL-SAS, 2 SSD
Model GL2S:
2 Enclosures, 12U
166 NL-SAS, 2 SSD
Capacity
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
36 GB/s12 GB/s 24 GB/s
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
Model GS1S
24 SSD
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
Model GS2S
48 SSD
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
EXP3524
8
9
16
17
EXP3524
8
9
16
17
Model GS4S
96 SSD
Speed
40 GB/s
14 GB/s
Model GL1Sz:
1 Enclosures, 9U
82 NL-SAS, 2 SSD
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
38 GB/s 40 GB/s
Model GH14S:
1 2U24 Enclosure SSD
4 5U84 Enclosure HDD
334 NL-SAS, 24 SSD
Model GH24S:
2 2U24 Enclosure SSD
4 5U84 Enclosure HDD
334 NL-SAS, 48 SSD
6 GB/s

IBMStorageandSDI
Consolidate capacity storage for a cognitive and AI enterprise
NAS
Services
File sync
& share
Archive
Data
Backup &
Cloud
Backup
Cloud
Repository/Service
IoT
Repository
Mobile Apps
Access multiple distributed applications concurrently One or more sites with geo-dispersed data
DVR & Video
Repository
Image/Voice
Repository
AnalyticsFile
Archive
Financial
Compliance
Healthcare
Cardiology,
Radiology PACS
Research &
Patient Data Cloud Native Apps
Media
Production/
Archive /
Distribution
Compliance &
Retention
Backup, Archive and File
Services
Data Oceans and
Repositories Industry Specific Data
New Cloud
Applications
Documents
Fast data discoveryEfficient data analysis
22Page
Actions based on dataData tagging

IBMStorageandSDI
The Market reinforces IBM transformational story
Gartner Critical Capabilities for Object Storage
#1 Analytics #1 Archiving #1 Backup #1 Cloud Storage
* Source: Gartner Critical Capabilities for Object Storage Published 30 January 2019 - ID G00352191
Gartner MQ and IDC MarketScape
IBM worldwide object-based leadership
Gartner: MQ IDC MarketScape CRN Tech Innovator Tech Target
LEADER LEADER WINNER FINALIST
Distributed File Systems and MarketScape for Object Storage Storage – Cloud Product of the Year
Object Storage Software Defined Storage
October 2018 June 2018 December 2018 January 2019
3 years in a row 5 years in a row First Year First Year
January 2019

IBMStorageandSDI
Transformational Insight for AI, Analytics, Governance, &
Optimization – Expedite time to discovery
• Automate cataloging of data by capturing metadata as
it’s created
• Locate and identify the most relevant data regardless
of its type or location
• Use simple SQL query commands using GUI interface
or API scripts
• Enable comprehensive insight by combining system
metadata with custom tags to increase storage admin
& data consumer productivity
• Create custom tags, and policy-based workflows to
orchestrate content inspection & activate data in AI,
ML, & analytics workflows
Scanning and Event Notifications

IBM Storage and SDI
Data Unification with IBM Spectrum Scale
and HDP

IBMStorageandSDI
Reduce datacenter footprint and get
faster ingest with in-place analytics
Data
NFS
SMB POSIX Object
HDFS API
Access to the data using any of the industry standard protocols.
No need to maintain separate copies for different applications.
Flexible storage architectures
Flexibility in architectures with the support of hybrid architecture under
common namespace. Support for running containerized workloads.
Extreme scalability with
parallel file system architecture
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
Scale to billions of files.
No centralized metadata node bottleneck.
ESS
Why IBM Spectrum Scale for Analytics/AI workloads?
Unmatched Scalability and Performance with the most optimized storage footprint
Full Data Life Cycle Management
Flash Disk
Storage rich servers
Storage
pool1
Storage
pool2
Storage
poolx
External Storage
poolx
Tape
IBM TSM/LTFS
Spectrum Scale
Storage
pool1
Storage
pool2
Storage
poolx
External Storage
poolx
Data Migration between various storage pools
with policy based Auto Tiering
Install SW directly
on compute nodes
Shared storageOR
Performance leadership in AI benchmarks
40GB/s and 300TB in 2U, Linear scaling of 120GB/s in 6U

IBMStorageandSDI
IBM Spectrum Scale + Hortonworks HDP
• Spectrum Scale becomes the storage layer in your HDP environment.
• Spectrum Scale supports accessing data using HDFS API and hence is transparent to the applications using HDP.
• Enterprise class storage for your Hadoop/Spark environment (Encryption, Compression, Tiering, DR…)
IBM Spectrum ScaleHDFS – Scale Transparency Connector
Hortonworks HDP with IBM Spectrum
Scale IBM Redbook

28 IBM Spectrum Scale
IBM ESS Shared-Storage Model vs Classic HDFS Shared-Nothing Cluster
10 GigE / 40 GigE
HDP Storage-Rich
Worker Nodes
HDP HDP HDP
Standard Shared-Nothing model on storage-rich servers
- Inefficient, inflexible, and expensive
- Expensive, wasteful, and with high OPEX to
scale and manage compute and storage
- Lacks enterprise features
• Disaggregated “thin” worker nodes with fewer disks
• No application-data disks in servers
• Replaced with shared storage
• No need for storage-only nodes
• Avoidance cluster sprawl with high
performance, flexibility, and enterprise features
• All with HDFS compatibility

IBM Storage and SDI
Data Unification with IBM Spectrum Scale
Use Cases

EDW Optimization
Simplify data management using common storage between EDW and Hadoop
Archive Data away from EDW
- Move cold or rarely used data to Hadoop
as active archive
- Store more of data longer
Offload costly ETL process
- Free your EDW to perform high-value functions
like analytics & operations, not ETL
- Use Hadoop for advanced ETL
Optimize the value of your EDW
- Use Hadoop to refine new data sources, such as
web and machine data for new analytical context
Reduce migration effort & skillset gap
- Use existing investment in Oracle/DB2/Netezza
skills
- BigSQL allows you to migrate applications without
major code rewrites and additional SQL
development
Control cluster sprawl
- Grow storage independent of compute with ESS
- POWER servers deliver 1.7x throughput compared
to Hortonworks on x86
- Up-to 60% less storage footprint
Enterprise Data
Warehouse
DB2 / Dashdb / Oracle /
Netezza / Teradata …
Hot Data
Hortonworks
Hadoop
Cold Data, Archive Data,
New Sources
BigSQL SQL Interface
BI Software
(Business Analytics, Visualization like SAS grid, SAP HANA etc)
ESS for
Speed
ESS for
Data Lake
Spectrum
Scale
A Financial Services company in Europe is optimizing their DB2 warehouse using Hortonworks Hadoop; and is using
ESS as the common storage behind DB2 and Hadoop.
New Data Sources
Streaming / IOT data

© 2018 IBM Corporation
Large banking
group selects
scalable data
science
platform to
develop new
smart banking
services
through use of
AI in real-time
Business problem
• Needed to improve client experience and create
new client services by identifying new patterns in its
data through use of data science and AI techniques
• Existing Hadoop infrastructure solution did not have
sufficient throughput and scalability
Solution
• POWER9 cluster with L922 servers (x96) and
AC922 servers (x3)
• IBM Elastic Storage Server (ESS) with Spectrum
Scale: GL1S (x2) and GL2S (x2)
• Hortonworks Data Platform (HDP) and IBM Watson
Studio (formerly DSX)
Benefits
• Open, virtualized infrastructure solution based on
IBM Power Servers running HDP and Watson
Studio
• Optimized, scalable and highly available Storage
Architecture with IBM Spectrum Scale based ESS
• Integrated security of DSX+HDP in conjunction with
higher throughput of POWER9 servers
outperformed Intel and reduced time to value
• End-to-end solution that addressed all requirements
around performance, security, costs, and ability to
scale
New Smart AI ServicesNew AI-Driven Client Services in Banking
IBM Spectrum Scale

Unified Analytics Workflows
Single data lake for Hadoop and non-Hadoop analytics
A bank in South Africa is implementing HDP and SAS grid software on a common ESS based infrastructure.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Other
Analytics
Platforms
SAS grid, SAP
HANA/Vora, ML/DL,
Conductor with Spark etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
All analytics workflows on common storage
- Improve data reliability and governance with single data
lake for Hadoop and non-Hadoop analytics setups
- Build ML/DL workflows that use multiple analytics
platforms
- Share data across analytics workflows as appropriate
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage gives
super fast ingest ability
- POWER servers deliver 1.7x throughput compared to
Hortonworks on x86

© 2018 IBM Corporation
Large bank
delivers
personalized
banking in real-
time to millions
of customers
by applying
new analytics
and data
science.
Business problem
• Aggressively improve their analytics maturity by
delivering Predictive Analytics capability providing
a Data-driven Customer Experience
• Develop open platform that can ingest all relevant
data from various sources with the ability to extract
new insights
Solution
• POWER8 cluster with S822L servers (x24)
• IBM Elastic Storage Server (ESS) with Spectrum
Scale: GL2S (x2)
• Hortonworks Data Platform (HDP)
Benefits
• Open infrastructure solution based on IBM Power
Servers running Linux and HDP
• Optimized, scalable and highly available Storage
Architecture with IBM Spectrum Scale based ESS
• Better overall TCO: Superior performance with less
than half the number of compute nodes where
Power + ESS outperformed local storage on Intel
• Leverage ESS in-place analytics to host both HDP
and SAS workloads on single data layer reducing
data copies and improving data governance Predictive Analytics
Data-Driven Customer Banking
IBM Spectrum Scale

Integrated HPC and Hadoop
Efficiently transform data into insights with single data lake for HPC & Hadoop
NASA and a Healthcare company from middle east are using common Spectrum Scale data lake to
efficiently get insights using traditional HPC and Hadoop analytics.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Traditional HPC
Open, Read, Write, MPI, C-code,
Python etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
NFS/SMB/Object
Interface
Spectrum Scale
Protocol Node
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
Extend HPC to add modern analytics
capabilities
- Efficient movement of data between modern and
traditional applications with common namespace
- Spectrum Scale in-place analytics capabilities
enable accessing the same data using
NFS/SMB/Object/POSIX/HDFS without requiring
any modifications to the data
- Improve data reliability and governance with single
data lake
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage
gives super fast ingest ability
- Common namespace enables running some edge
analytics at the ingest layer as well
- POWER servers deliver 1.7x throughput compared
to Hortonworks on x86

IBMStorageandSDI
Solutions – IBM Spectrum Storage for AI
Available Solutions:
§ IBM Spectrum Storage for AI with Power Systems
§ IBM Spectrum Storage for AI with NVIDIA DGX (leading AI x86 based solution)
§ IBM Spectrum Storage for Hadoop/Spark workloads (Hortonworks/Cloudera)
§ IBM Spectrum Storage for AI in Autonomous Driving
35
IBM Spectrum Storage for AI supercharges your AI data pipeline with storage
solutions optimized for the unique demands of AI.
Integrating industry-leading servers, ISV / open source software and IBM
software-defined storage, IBM Spectrum Storage for AI delivers simplified
deployment, groundbreaking performance, and extended data management to
drive developer productivity with the fastest path to insights.
https://www.ibm.com/it-infrastructure/storage/ai-infrastructure

© IBM Corporation 2019 36
“IBM’s Spectrum Storage for AI is differentiated
from both the NetApp and Pure Storage
offerings. IBM Spectrum Storage for AI provides
a level of scalability that is nearly unmatched by
anyone in the industry. It’s both incredibly fast
at scale, and it scales linearly.
The ability for IBM Spectrum Storage for AI to
seamlessly integrate with the rest of the
Spectrum Storage suite should make IBM’s
solution an easy decision for enterprise buyers.”
§ Steve McDowell

IBM Storage and SDI
Questions?

IBM Storage and SDI
Thank You!

Unifying the Silos: Optimize your Data Pipeline for Analytics and AI

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Unifying the Silos: Optimize your Data Pipeline for Analytics and AI

Similaire à Unifying the Silos: Optimize your Data Pipeline for Analytics and AI (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

Unifying the Silos: Optimize your Data Pipeline for Analytics and AI