SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
IBM Storage and SDI
© Copyright IBM Corporation 2018
Unifying the Silos :
Optimize your data pipeline for Analytics and AI
Gary Tomchuk
IBM Global SW Defined Storage Sales
Benoit Granier
IBM File and Object Systems Technical Manager
for Europe
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without
notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not
be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can
be given that an individual user will achieve results similar to those stated here.
2
Please note
Notices and disclaimers
3Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation
© 2018 International Business Machines Corporation. No part of this
document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to
products that have not yet been announced by IBM) has been reviewed
for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this
information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and
services are warranted per the terms and conditions of the agreements
under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product
plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a
controlled, isolated environments. Customer examples are presented as
illustrations of how those customers have used IBM products and the
results they may have achieved. Actual performance, cost, savings or
other results in other operating environments may vary.
References in this document to IBM products, programs, or services
does not imply that IBM intends to make such products, programs or
services available in all countries in which IBM operates or does
business.
Workshops, sessions and associated materials may have been prepared
by independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for
informational purposes only, and are neither intended to, nor shall
constitute legal or other guidance or advice to any individual participant
or their specific situation.
It is the customer’s responsibility to insure its own compliance
with legal requirements and to obtain advice of competent legal counsel
as to the identification and interpretation of any relevant laws and
regulatory requirements that may affect the customer’s business and
any actions the customer may need to take to comply with such
laws. IBM does not provide legal advice or represent or warrant that its
services or products will ensure that the customer follows any law.
Notices and disclaimers
continued
4Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed
to the suppliers of those products. IBM does not warrant the quality of
any third-party products, or the ability of any such third-party products
to interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM
products and services used in the presentation] are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at “Copyright and trademark
information” at: www.ibm.com/legal/copytrade.shtml.
IBMStorageandSDI
© Copyright IBM Corporation 2018
Agenda
§ Data Management Challenges in Analytics and AI
§ AI Data Pipeline with IBM Spectrum Storage
§ IBM Spectrum Storage offering for Analytics and AI
§ IBM Spectrum Scale
§ IBM Spectrum Discover
§ IBM Cloud Object Storage
§ Data Unification using IBM Spectrum Scale with HDP
§ Data Unification Use Cases
§ IBM Spectrum Storage for AI - Solutions
5
IBM Storage and SDI
© Copyright IBM Corporation 2018
Data Management Challenges in
Analytics and AI
IBMStorageandSDI
© Copyright IBM Corporation 2018
Biggest Unstructured Data Challenges
Source: Forrester Analytics, Global Business Technographics Data And Analytics Survey, 2017,
Global Business Technographics Data And Analytics Survey, 2016 (Enterprises with 1000+ employees)
of firms see sourcing,
gathering, managing &
governing data as their
biggest challenges
when using systems of insight
39%
Number of enterprises
with 1,000 TB+
unstructured data
stores grew
from 2016
to 2017
3X
IBMStorageandSDI
© Copyright IBM Corporation 2018
Data Management Challenges
§ Silos of infrastructure for various analytics use cases
§ Multiple copies of the same data without a single source of truth
§ Analytics on the stale data
§ Time consuming data ingest cycle
§ Unmanageable cluster sprawl with data growth
IBM Storage and SDI
© Copyright IBM Corporation 2018
AI Data Pipeline for IBM Spectrum
Storage
© IBM Corporation 2018 10
AI, Analytics
and Data
Pipelines
AI and Big Data pipelines need to support high
performance Data Analytics and AI/Machine
Learning /Deep Learning from early
experimentation to shared data services on
production clusters
POWERAI
Shorten Time to Value with IBM Storage
INGEST INFERENCETRAININGCLASSIFY
AI Data Workflow
Champion
Challenger
80% of Data
Science Time
Resource
Optimization
Provision
Time
NEWDATA
AI Workflow
Why IBM?
Business Value
Data Scientist Productivity
Reduce Time to Accuracy, Improve Provisioning Time,
Increase Cycles, Reduce Human Error
• Improve velocity by getting to your data faster using tools,
not trial & error
The most scalable, low latency storage platform
Minimize data movement
Increase performance, automate storage processes and
reduce cost
• Using the leading portfolio of Software-defined storage
Optimized Economics
• Balance performance and cost with system choices
Proven Reference Architecture
• Higher performance, more confidence, lower costs
Industry Standard Approach
• Deliver consistency and efficiencies
Uses Technology advances
• GPU, Open Source Frameworks
Headwinds Challenge time-to-value
Lower CAPEX
Improve Model Quality
Faster Time to Insight
Business Agility
Lower OPEX
Higher Client Experience
Automation Savings
Look for dynamically adaptable, simple, flexible,
secure, cost-efficient, and elastic infrastructure that can
support high capacity along with high throughput and low
latency for high performance training and inferencing
experience.
IDC
IBMStorageandSDI
© Copyright IBM Corporation 2018
The Goal: Move Data from Ingest to Insights
INSIGHTSCLASSIFY / TRANSFORM ANALYZE / TRAININGESTEDGE
IBMStorageandSDI
© Copyright IBM Corporation 2018
Trained Model
SSD/NVMe
ML / DL
Prep Training Inference
IBM AI Data Pipeline
Throughput-oriented,
software defined
temporary landing
zone
High throughput
performance tier
Transient Storage
Global Ingest
Fast Ingest /
Real-time Analytics Archive
Classification &
Metadata Tagging
SSD
SDS/Cloud
Cloud Hybrid/HDD
INSIGHTSANALYZE / TRAININGEST
Insights Out
High scalability, large/sequential I/O capacity tier
EDGE CLASSIFY / TRANSFORM
TapeHDD Cloud
High volume, index &
auto-tagging zone
Throughput-oriented,
performance &
capacity tier
Throughput-oriented,
globally accessible
capacity tier
High throughput, low
latency, random I/O
performance tier
ETL
Data In
High throughput, random
I/O, performance &
capacity Tier
Hadoop / Spark
Data Lakes
SSD/Hybrid
Inference
IBMStorageandSDI
© Copyright IBM Corporation 2018
IBM AI Data Pipeline with IBM Spectrum Storage
Improved data governance with storage offerings for end-to-end data pipeline
Spectrum Scale
Cloud Object
Storage
Cloud Object
Storage
Elastic
Storage Server
Elastic
Storage Server
Elastic
Storage Server
Transient Storage
Global Ingest
Fast Ingest /
Real-time Analytics Archive
Spectrum
Archive
Hadoop / Spark
Data Lakes
Data In
Insights Out
INSIGHTSANALYZE / TRAININGESTEDGE CLASSIFY / TRANSFORM
SSD
SDS/Cloud
Cloud
SSD/Hybrid
Hybrid/HDD
TapeHDD Cloud
Trained Model
SSD/NVMe
ML / DL
Prep Training Inference
Spectrum Discover Elastic
Storage Server
Cloud Object
Storage
Elastic
Storage Server
ETL
Classification &
Metadata Tagging
Inference
IBM Storage and SDI
© Copyright IBM Corporation 2018
IBM Spectrum Storage Offerings for
Analytics and AI
IBMStorageandSDI
© Copyright IBM Corporation 2018
Delivers Data Management at scale for
enterprises that are swamped by data
IBM Spectrum Scale
Lets you grow and share the storage infrastructure
while automatically moving file and object data to the
optimal storage tier as quickly as possible.
IBM Spectrum Scale
Store Everywhere. Run Anywhere.
© 2018 IBM Corporation© Copyright IBM Corporation 2018
IBM Spectrum Scale – Data Management at Scale
Spectrum Scale
Encryption and
Compression
NFS SMBFile ObjectHDFS
Distributed RAID
• Software defined file storage with high performance
and extreme scalability
• 50% of systems delivering top Spec-SFS benchmarks
run IBM Spectrum Scale SW.
• Supports file systems with sizes of tens of petabytes
that contain billions of files and can be accessed by
thousands of nodes in a cluster.
• Smart policy engine to optimize utilization with
multiple storage tiers
Flash->Disk->Cloud->Tape
• Enterprise class storage features like Disaster
recovery, Encryption, Compression, Erasure Coding
• Flexibility in storage architectures shared-nothing,
shared-storage or hybrid.
Fast
Disk
Slow
Disk
TapeSSD Fast
Disk
Slow
Disk
IBM Spectrum Scale – Data Management at Scale
© 2018 IBM Corporation18
IBM Spectrum Scale
Proven at over
4,000 customers
worldwide
Most common use-
cases:
- High performance computing
- Big data workloads like
Hadoop, Spark
- Enterprise analytics workloads
like SAS grid, SAP HANA
- AI/ML/DL like genomics,
autonomous driving
- High performance active
archive stores
4 time Champion Infiniti Red Bull Racing
does real-time race analytics
Personalized cancer treatment
for over 65,000 patients
Climate and weather modeling with
16 PB on line & 12 PB archive on tape
R&D environment for
natural language tools
Semiconductor Design
Higher profits from
shorter chip design cycles
Shared storage for global banking
100 times faster than incumbent solution
IBMStorageandSDI
© Copyright IBM Corporation 2018
IBM Spectrum Scale Storage
…for the world’s most powerful supercomputers
Summit System
• 4608 nodes, each with:
• 2 IBM Power9 processors
• 6 Nvidia Tesla V100 GPUs
• 608 GB of fast memory
• 1.6 TB of NVMe memory
• 200 petaflops peak
performance for modeling
and simulation
• 3.3 ExaOps peak
performance for data
analytics and AI
IBM Spectrum Scale
IBM Elastic Storage
Server
2.5 TB/sec throughput
to storage architecture
250 PB HDD storage
capacity
Sierra System
• 4320 nodes, each with
• 2 IBM Power9 processors
• 4 Nvidia V100 GPUs
• 320 GB of node memory
• 1.6 TB of NVMe memory
• IBM Spectrum Scale
• IBM Elastic Storage Server
125 petaflops peak performance
154 PB HDD storage capacity
World’s most
powerful
supercomputer World #2
supercomputer
IBMStorageandSDI
© Copyright IBM Corporation 2018
IBM Elastic Storage Server (ESS)
Integrated scale-out data management for file and object data
Optimal building block for high-performance, scalable,
reliable enterprise Spectrum Scale storage
• Faster data access with choice to scale-up or out
• Easy to deploy clusters with unified system GUI
• Simplified storage administration with IBM Spectrum Control integration
One solution for all your Spectrum Scale data needs
• Single repository of data with unified file and object support
• Anywhere access with multi-protocol support:
NFS 4.0, SMB, OpenStack Swift, Cinder, and Manila
• Ideal for Big Data Analytics with full Hadoop transparency
Ready for business critical data
• Disaster recovery with synchronous or asynchronous replication
• Ensure reliability and fast rebuild times using Spectrum Scale RAID’s
dispersed data and erasure code
• Five 99999s of availability
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
IBMStorageandSDI
© Copyright IBM Corporation 2018
IBM Elastic Storage Server (ESS) Family
Model GL4S:
4 Enclosures, 20U
334 NL-SAS, 2 SSD
Model GL6S:
6 Enclosures, 28U
502 NL-SAS, 2 SSD
Model GL2S:
2 Enclosures, 12U
166 NL-SAS, 2 SSD
Capacity
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
36 GB/s12 GB/s 24 GB/s
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
Model GS1S
24 SSD
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
Model GS2S
48 SSD
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
EXP3524
8
9
16
17
EXP3524
8
9
16
17
Model GS4S
96 SSD
Speed
40 GB/s
14 GB/s
Model GL1Sz:
1 Enclosures, 9U
82 NL-SAS, 2 SSD
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
38 GB/s 40 GB/s
Model GH14S:
1 2U24 Enclosure SSD
4 5U84 Enclosure HDD
334 NL-SAS, 24 SSD
Model GH24S:
2 2U24 Enclosure SSD
4 5U84 Enclosure HDD
334 NL-SAS, 48 SSD
6 GB/s
IBMStorageandSDI
© Copyright IBM Corporation 2018
Consolidate capacity storage for a cognitive and AI enterprise
NAS
Services
File sync
& share
Archive
Data
Backup &
Cloud
Backup
Cloud
Repository/Service
IoT
Repository
Mobile Apps
Access multiple distributed applications concurrently One or more sites with geo-dispersed data
DVR & Video
Repository
Image/Voice
Repository
AnalyticsFile
Archive
Financial
Compliance
Healthcare
Cardiology,
Radiology PACS
Research &
Patient Data Cloud Native Apps
Media
Production/
Archive /
Distribution
Compliance &
Retention
Backup, Archive and File
Services
Data Oceans and
Repositories Industry Specific Data
New Cloud
Applications
Documents
Fast data discoveryEfficient data analysis
22Page
Actions based on dataData tagging
IBMStorageandSDI
© Copyright IBM Corporation 2018
The Market reinforces IBM transformational story
Gartner Critical Capabilities for Object Storage
#1 Analytics #1 Archiving #1 Backup #1 Cloud Storage
* Source: Gartner Critical Capabilities for Object Storage Published 30 January 2019 - ID G00352191
Gartner MQ and IDC MarketScape
IBM worldwide object-based leadership
Gartner: MQ IDC MarketScape CRN Tech Innovator Tech Target
LEADER LEADER WINNER FINALIST
Distributed File Systems and MarketScape for Object Storage Storage – Cloud Product of the Year
Object Storage Software Defined Storage
October 2018 June 2018 December 2018 January 2019
3 years in a row 5 years in a row First Year First Year
January 2019
IBMStorageandSDI
© Copyright IBM Corporation 2018
Transformational Insight for AI, Analytics, Governance, &
Optimization – Expedite time to discovery
• Automate cataloging of data by capturing metadata as
it’s created
• Locate and identify the most relevant data regardless
of its type or location
• Use simple SQL query commands using GUI interface
or API scripts
• Enable comprehensive insight by combining system
metadata with custom tags to increase storage admin
& data consumer productivity
• Create custom tags, and policy-based workflows to
orchestrate content inspection & activate data in AI,
ML, & analytics workflows
Scanning and Event Notifications
IBM Storage and SDI
© Copyright IBM Corporation 2018
Data Unification with IBM Spectrum Scale
and HDP
IBMStorageandSDI
© Copyright IBM Corporation 2018
Reduce datacenter footprint and get
faster ingest with in-place analytics
Data
NFS
SMB POSIX Object
HDFS API
Access to the data using any of the industry standard protocols.
No need to maintain separate copies for different applications.
Flexible storage architectures
Flexibility in architectures with the support of hybrid architecture under
common namespace. Support for running containerized workloads.
Extreme scalability with
parallel file system architecture
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
Scale to billions of files.
No centralized metadata node bottleneck.
ESS
Why IBM Spectrum Scale for Analytics/AI workloads?
Unmatched Scalability and Performance with the most optimized storage footprint
Full Data Life Cycle Management
Flash Disk
Storage rich servers
Storage
pool1
Storage
pool2
Storage
poolx
External Storage
poolx
Tape
IBM TSM/LTFS
Spectrum Scale
Storage
pool1
Storage
pool2
Storage
poolx
External Storage
poolx
Data Migration between various storage pools
with policy based Auto Tiering
Install SW directly
on compute nodes
Shared storageOR
Performance leadership in AI benchmarks
40GB/s and 300TB in 2U, Linear scaling of 120GB/s in 6U
IBMStorageandSDI
© Copyright IBM Corporation 2018
IBM Spectrum Scale + Hortonworks HDP
• Spectrum Scale becomes the storage layer in your HDP environment.
• Spectrum Scale supports accessing data using HDFS API and hence is transparent to the applications using HDP.
• Enterprise class storage for your Hadoop/Spark environment (Encryption, Compression, Tiering, DR…)
IBM Spectrum ScaleHDFS – Scale Transparency Connector
Hortonworks HDP with IBM Spectrum
Scale IBM Redbook
28 IBM Spectrum Scale
IBM ESS Shared-Storage Model vs Classic HDFS Shared-Nothing Cluster
10 GigE / 40 GigE
HDP Storage-Rich
Worker Nodes
HDP HDP HDP
Standard Shared-Nothing model on storage-rich servers
- Inefficient, inflexible, and expensive
- Expensive, wasteful, and with high OPEX to
scale and manage compute and storage
- Lacks enterprise features
• Disaggregated “thin” worker nodes with fewer disks
• No application-data disks in servers
• Replaced with shared storage
• No need for storage-only nodes
• Avoidance cluster sprawl with high
performance, flexibility, and enterprise features
• All with HDFS compatibility
IBM Storage and SDI
© Copyright IBM Corporation 2018
Data Unification with IBM Spectrum Scale
Use Cases
EDW Optimization
Simplify data management using common storage between EDW and Hadoop
Archive Data away from EDW
- Move cold or rarely used data to Hadoop
as active archive
- Store more of data longer
Offload costly ETL process
- Free your EDW to perform high-value functions
like analytics & operations, not ETL
- Use Hadoop for advanced ETL
Optimize the value of your EDW
- Use Hadoop to refine new data sources, such as
web and machine data for new analytical context
Reduce migration effort & skillset gap
- Use existing investment in Oracle/DB2/Netezza
skills
- BigSQL allows you to migrate applications without
major code rewrites and additional SQL
development
Control cluster sprawl
- Grow storage independent of compute with ESS
- POWER servers deliver 1.7x throughput compared
to Hortonworks on x86
- Up-to 60% less storage footprint
Enterprise Data
Warehouse
DB2 / Dashdb / Oracle /
Netezza / Teradata …
Hot Data
Hortonworks
Hadoop
Cold Data, Archive Data,
New Sources
BigSQL SQL Interface
BI Software
(Business Analytics, Visualization like SAS grid, SAP HANA etc)
ESS for
Speed
ESS for
Data Lake
Spectrum
Scale
A Financial Services company in Europe is optimizing their DB2 warehouse using Hortonworks Hadoop; and is using
ESS as the common storage behind DB2 and Hadoop.
New Data Sources
Streaming / IOT data
© 2018 IBM Corporation
Large banking
group selects
scalable data
science
platform to
develop new
smart banking
services
through use of
AI in real-time
Business problem
• Needed to improve client experience and create
new client services by identifying new patterns in its
data through use of data science and AI techniques
• Existing Hadoop infrastructure solution did not have
sufficient throughput and scalability
Solution
• POWER9 cluster with L922 servers (x96) and
AC922 servers (x3)
• IBM Elastic Storage Server (ESS) with Spectrum
Scale: GL1S (x2) and GL2S (x2)
• Hortonworks Data Platform (HDP) and IBM Watson
Studio (formerly DSX)
Benefits
• Open, virtualized infrastructure solution based on
IBM Power Servers running HDP and Watson
Studio
• Optimized, scalable and highly available Storage
Architecture with IBM Spectrum Scale based ESS
• Integrated security of DSX+HDP in conjunction with
higher throughput of POWER9 servers
outperformed Intel and reduced time to value
• End-to-end solution that addressed all requirements
around performance, security, costs, and ability to
scale
New Smart AI ServicesNew AI-Driven Client Services in Banking
IBM Spectrum Scale
Unified Analytics Workflows
Single data lake for Hadoop and non-Hadoop analytics
A bank in South Africa is implementing HDP and SAS grid software on a common ESS based infrastructure.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Other
Analytics
Platforms
SAS grid, SAP
HANA/Vora, ML/DL,
Conductor with Spark etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
All analytics workflows on common storage
- Improve data reliability and governance with single data
lake for Hadoop and non-Hadoop analytics setups
- Build ML/DL workflows that use multiple analytics
platforms
- Share data across analytics workflows as appropriate
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage gives
super fast ingest ability
Control cluster sprawl
- Grow storage independent of compute with ESS
- Up-to 60% less storage footprint
- POWER servers deliver 1.7x throughput compared to
Hortonworks on x86
© 2018 IBM Corporation
Large bank
delivers
personalized
banking in real-
time to millions
of customers
by applying
new analytics
and data
science.
Business problem
• Aggressively improve their analytics maturity by
delivering Predictive Analytics capability providing
a Data-driven Customer Experience
• Develop open platform that can ingest all relevant
data from various sources with the ability to extract
new insights
Solution
• POWER8 cluster with S822L servers (x24)
• IBM Elastic Storage Server (ESS) with Spectrum
Scale: GL2S (x2)
• Hortonworks Data Platform (HDP)
Benefits
• Open infrastructure solution based on IBM Power
Servers running Linux and HDP
• Optimized, scalable and highly available Storage
Architecture with IBM Spectrum Scale based ESS
• Better overall TCO: Superior performance with less
than half the number of compute nodes where
Power + ESS outperformed local storage on Intel
• Leverage ESS in-place analytics to host both HDP
and SAS workloads on single data layer reducing
data copies and improving data governance Predictive Analytics
Data-Driven Customer Banking
IBM Spectrum Scale
Integrated HPC and Hadoop
Efficiently transform data into insights with single data lake for HPC & Hadoop
NASA and a Healthcare company from middle east are using common Spectrum Scale data lake to
efficiently get insights using traditional HPC and Hadoop analytics.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Traditional HPC
Open, Read, Write, MPI, C-code,
Python etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
NFS/SMB/Object
Interface
Spectrum Scale
Protocol Node
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
Extend HPC to add modern analytics
capabilities
- Efficient movement of data between modern and
traditional applications with common namespace
- Spectrum Scale in-place analytics capabilities
enable accessing the same data using
NFS/SMB/Object/POSIX/HDFS without requiring
any modifications to the data
- Improve data reliability and governance with single
data lake
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage
gives super fast ingest ability
- Common namespace enables running some edge
analytics at the ingest layer as well
Control cluster sprawl
- Grow storage independent of compute with ESS
- Up-to 60% less storage footprint
- POWER servers deliver 1.7x throughput compared
to Hortonworks on x86
IBMStorageandSDI
© Copyright IBM Corporation 2018
Solutions – IBM Spectrum Storage for AI
Available Solutions:
§ IBM Spectrum Storage for AI with Power Systems
§ IBM Spectrum Storage for AI with NVIDIA DGX (leading AI x86 based solution)
§ IBM Spectrum Storage for Hadoop/Spark workloads (Hortonworks/Cloudera)
§ IBM Spectrum Storage for AI in Autonomous Driving
35
IBM Spectrum Storage for AI supercharges your AI data pipeline with storage
solutions optimized for the unique demands of AI.
Integrating industry-leading servers, ISV / open source software and IBM
software-defined storage, IBM Spectrum Storage for AI delivers simplified
deployment, groundbreaking performance, and extended data management to
drive developer productivity with the fastest path to insights.
https://www.ibm.com/it-infrastructure/storage/ai-infrastructure
© IBM Corporation 2019 36
“IBM’s Spectrum Storage for AI is differentiated
from both the NetApp and Pure Storage
offerings. IBM Spectrum Storage for AI provides
a level of scalability that is nearly unmatched by
anyone in the industry. It’s both incredibly fast
at scale, and it scales linearly.
The ability for IBM Spectrum Storage for AI to
seamlessly integrate with the rest of the
Spectrum Storage suite should make IBM’s
solution an easy decision for enterprise buyers.”
§ Steve McDowell
IBM Storage and SDI
© Copyright IBM Corporation 2018
Questions?
IBM Storage and SDI
© Copyright IBM Corporation 2018
Thank You!

Contenu connexe

Tendances

Cloud in Supply Chain
Cloud in Supply ChainCloud in Supply Chain
Cloud in Supply ChainAmal Dev
 
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...DataWorks Summit
 
Change data capture
Change data captureChange data capture
Change data captureJames Deppen
 
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...DataWorks Summit
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformDriven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformArne Roßmann
 
Postgres Vision 2018: How to Consume your Database Platform On-premises
Postgres Vision 2018: How to Consume your Database Platform On-premisesPostgres Vision 2018: How to Consume your Database Platform On-premises
Postgres Vision 2018: How to Consume your Database Platform On-premisesEDB
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsPowering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsHitachi Vantara
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle managementInterfacing
 
Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019Slobodan Sipcic
 
Benefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica CloudBenefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica CloudAshwin V.
 
Cloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards InfographicCloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards InfographicHitachi Vantara
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Kai Wähner
 
Journey to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsJourney to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsDataWorks Summit
 

Tendances (20)

Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
 
Cloud in Supply Chain
Cloud in Supply ChainCloud in Supply Chain
Cloud in Supply Chain
 
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
 
Change data capture
Change data captureChange data capture
Change data capture
 
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformDriven by data - Why we need a Modern Enterprise Data Analytics Platform
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
 
Postgres Vision 2018: How to Consume your Database Platform On-premises
Postgres Vision 2018: How to Consume your Database Platform On-premisesPostgres Vision 2018: How to Consume your Database Platform On-premises
Postgres Vision 2018: How to Consume your Database Platform On-premises
 
Business Intelligence In The Cloud
Business Intelligence In The CloudBusiness Intelligence In The Cloud
Business Intelligence In The Cloud
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsPowering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
 
Backup Solution
Backup SolutionBackup Solution
Backup Solution
 
The Manulife Journey
The Manulife JourneyThe Manulife Journey
The Manulife Journey
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
 
Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019
 
Benefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica CloudBenefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica Cloud
 
Cloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards InfographicCloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards Infographic
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
 
Journey to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, BenefitsJourney to Big Data: Main Issues, Solutions, Benefits
Journey to Big Data: Main Issues, Solutions, Benefits
 

Similaire à Unifying the Silos: Optimize your Data Pipeline for Analytics and AI

Igniting Application Testing with AI + Automation
Igniting Application Testing with AI + Automation Igniting Application Testing with AI + Automation
Igniting Application Testing with AI + Automation IBM
 
#8311: Transform the Enterprise with IBM Cloud Private
#8311: Transform the Enterprise with IBM Cloud Private#8311: Transform the Enterprise with IBM Cloud Private
#8311: Transform the Enterprise with IBM Cloud PrivateMichael Elder
 
Accelerate your digital transformation with IBM Cloud for CIO Focus Summit
Accelerate your digital transformation with IBM Cloud for CIO Focus SummitAccelerate your digital transformation with IBM Cloud for CIO Focus Summit
Accelerate your digital transformation with IBM Cloud for CIO Focus SummitMark Osborn
 
Introduction to IBM Cloud Private - April 2018
Introduction to IBM Cloud Private - April 2018Introduction to IBM Cloud Private - April 2018
Introduction to IBM Cloud Private - April 2018Michael Elder
 
Complete Solutions in ECM using IBM, Internal and Third Party, Custom Components
Complete Solutions in ECM using IBM, Internal and Third Party, Custom ComponentsComplete Solutions in ECM using IBM, Internal and Third Party, Custom Components
Complete Solutions in ECM using IBM, Internal and Third Party, Custom ComponentsPyramid Solutions, Inc.
 
Advanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data AnalyticsAdvanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data AnalyticsArvind Sathi
 
Managing integration in a multi cluster world
Managing integration in a multi cluster worldManaging integration in a multi cluster world
Managing integration in a multi cluster worldShikha Srivastava
 
IC6284A - The Art of Choosing the Best Cloud Solution
IC6284A - The Art of Choosing the Best Cloud SolutionIC6284A - The Art of Choosing the Best Cloud Solution
IC6284A - The Art of Choosing the Best Cloud SolutionHendrik van Run
 
4789 creating production-ready, secure and scalable applications in ibm cloud...
4789 creating production-ready, secure and scalable applications in ibm cloud...4789 creating production-ready, secure and scalable applications in ibm cloud...
4789 creating production-ready, secure and scalable applications in ibm cloud...Shikha Srivastava
 
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...paul young cpa, cga
 
Enabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere OptimEnabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere OptimVineet
 
OpenWhisk ChatBot InterConnect 2017
OpenWhisk ChatBot InterConnect 2017OpenWhisk ChatBot InterConnect 2017
OpenWhisk ChatBot InterConnect 2017Perry Cheng
 
IBM Design Thinking + Agile + DevOps Interconnect 2017
IBM Design Thinking + Agile + DevOps Interconnect 2017IBM Design Thinking + Agile + DevOps Interconnect 2017
IBM Design Thinking + Agile + DevOps Interconnect 2017David Luke
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Sap guided workflow in ibm bpm
Sap guided workflow in ibm bpmSap guided workflow in ibm bpm
Sap guided workflow in ibm bpmsflynn073
 
SAP guided workflow in IBM BPM
SAP guided workflow in IBM BPMSAP guided workflow in IBM BPM
SAP guided workflow in IBM BPMsflynn073
 
Fnb optimizes retail banking product offers using real-time propensity models...
Fnb optimizes retail banking product offers using real-time propensity models...Fnb optimizes retail banking product offers using real-time propensity models...
Fnb optimizes retail banking product offers using real-time propensity models...Avsharn
 
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on Bluemix
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on BluemixCapgemini Connected Car Demo Using IBM Internet of Things Foundation on Bluemix
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on BluemixCapgemini
 
TI 1641 - delivering enterprise software at the speed of cloud
TI 1641 - delivering enterprise software at the speed of cloudTI 1641 - delivering enterprise software at the speed of cloud
TI 1641 - delivering enterprise software at the speed of cloudVincent Burckhardt
 

Similaire à Unifying the Silos: Optimize your Data Pipeline for Analytics and AI (20)

Igniting Application Testing with AI + Automation
Igniting Application Testing with AI + Automation Igniting Application Testing with AI + Automation
Igniting Application Testing with AI + Automation
 
#8311: Transform the Enterprise with IBM Cloud Private
#8311: Transform the Enterprise with IBM Cloud Private#8311: Transform the Enterprise with IBM Cloud Private
#8311: Transform the Enterprise with IBM Cloud Private
 
Why Ibm cloud private
Why Ibm cloud private Why Ibm cloud private
Why Ibm cloud private
 
Accelerate your digital transformation with IBM Cloud for CIO Focus Summit
Accelerate your digital transformation with IBM Cloud for CIO Focus SummitAccelerate your digital transformation with IBM Cloud for CIO Focus Summit
Accelerate your digital transformation with IBM Cloud for CIO Focus Summit
 
Introduction to IBM Cloud Private - April 2018
Introduction to IBM Cloud Private - April 2018Introduction to IBM Cloud Private - April 2018
Introduction to IBM Cloud Private - April 2018
 
Complete Solutions in ECM using IBM, Internal and Third Party, Custom Components
Complete Solutions in ECM using IBM, Internal and Third Party, Custom ComponentsComplete Solutions in ECM using IBM, Internal and Third Party, Custom Components
Complete Solutions in ECM using IBM, Internal and Third Party, Custom Components
 
Advanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data AnalyticsAdvanced Analytics Platform for Big Data Analytics
Advanced Analytics Platform for Big Data Analytics
 
Managing integration in a multi cluster world
Managing integration in a multi cluster worldManaging integration in a multi cluster world
Managing integration in a multi cluster world
 
IC6284A - The Art of Choosing the Best Cloud Solution
IC6284A - The Art of Choosing the Best Cloud SolutionIC6284A - The Art of Choosing the Best Cloud Solution
IC6284A - The Art of Choosing the Best Cloud Solution
 
4789 creating production-ready, secure and scalable applications in ibm cloud...
4789 creating production-ready, secure and scalable applications in ibm cloud...4789 creating production-ready, secure and scalable applications in ibm cloud...
4789 creating production-ready, secure and scalable applications in ibm cloud...
 
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
 
Enabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere OptimEnabling Big Data with IBM InfoSphere Optim
Enabling Big Data with IBM InfoSphere Optim
 
OpenWhisk ChatBot InterConnect 2017
OpenWhisk ChatBot InterConnect 2017OpenWhisk ChatBot InterConnect 2017
OpenWhisk ChatBot InterConnect 2017
 
IBM Design Thinking + Agile + DevOps Interconnect 2017
IBM Design Thinking + Agile + DevOps Interconnect 2017IBM Design Thinking + Agile + DevOps Interconnect 2017
IBM Design Thinking + Agile + DevOps Interconnect 2017
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Sap guided workflow in ibm bpm
Sap guided workflow in ibm bpmSap guided workflow in ibm bpm
Sap guided workflow in ibm bpm
 
SAP guided workflow in IBM BPM
SAP guided workflow in IBM BPMSAP guided workflow in IBM BPM
SAP guided workflow in IBM BPM
 
Fnb optimizes retail banking product offers using real-time propensity models...
Fnb optimizes retail banking product offers using real-time propensity models...Fnb optimizes retail banking product offers using real-time propensity models...
Fnb optimizes retail banking product offers using real-time propensity models...
 
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on Bluemix
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on BluemixCapgemini Connected Car Demo Using IBM Internet of Things Foundation on Bluemix
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on Bluemix
 
TI 1641 - delivering enterprise software at the speed of cloud
TI 1641 - delivering enterprise software at the speed of cloudTI 1641 - delivering enterprise software at the speed of cloud
TI 1641 - delivering enterprise software at the speed of cloud
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Dernier (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

Unifying the Silos: Optimize your Data Pipeline for Analytics and AI

  • 1. IBM Storage and SDI © Copyright IBM Corporation 2018 Unifying the Silos : Optimize your data pipeline for Analytics and AI Gary Tomchuk IBM Global SW Defined Storage Sales Benoit Granier IBM File and Object Systems Technical Manager for Europe
  • 2. IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 Please note
  • 3. Notices and disclaimers 3Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation © 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law.
  • 4. Notices and disclaimers continued 4Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at: www.ibm.com/legal/copytrade.shtml.
  • 5. IBMStorageandSDI © Copyright IBM Corporation 2018 Agenda § Data Management Challenges in Analytics and AI § AI Data Pipeline with IBM Spectrum Storage § IBM Spectrum Storage offering for Analytics and AI § IBM Spectrum Scale § IBM Spectrum Discover § IBM Cloud Object Storage § Data Unification using IBM Spectrum Scale with HDP § Data Unification Use Cases § IBM Spectrum Storage for AI - Solutions 5
  • 6. IBM Storage and SDI © Copyright IBM Corporation 2018 Data Management Challenges in Analytics and AI
  • 7. IBMStorageandSDI © Copyright IBM Corporation 2018 Biggest Unstructured Data Challenges Source: Forrester Analytics, Global Business Technographics Data And Analytics Survey, 2017, Global Business Technographics Data And Analytics Survey, 2016 (Enterprises with 1000+ employees) of firms see sourcing, gathering, managing & governing data as their biggest challenges when using systems of insight 39% Number of enterprises with 1,000 TB+ unstructured data stores grew from 2016 to 2017 3X
  • 8. IBMStorageandSDI © Copyright IBM Corporation 2018 Data Management Challenges § Silos of infrastructure for various analytics use cases § Multiple copies of the same data without a single source of truth § Analytics on the stale data § Time consuming data ingest cycle § Unmanageable cluster sprawl with data growth
  • 9. IBM Storage and SDI © Copyright IBM Corporation 2018 AI Data Pipeline for IBM Spectrum Storage
  • 10. © IBM Corporation 2018 10 AI, Analytics and Data Pipelines AI and Big Data pipelines need to support high performance Data Analytics and AI/Machine Learning /Deep Learning from early experimentation to shared data services on production clusters POWERAI
  • 11. Shorten Time to Value with IBM Storage INGEST INFERENCETRAININGCLASSIFY AI Data Workflow Champion Challenger 80% of Data Science Time Resource Optimization Provision Time NEWDATA AI Workflow Why IBM? Business Value Data Scientist Productivity Reduce Time to Accuracy, Improve Provisioning Time, Increase Cycles, Reduce Human Error • Improve velocity by getting to your data faster using tools, not trial & error The most scalable, low latency storage platform Minimize data movement Increase performance, automate storage processes and reduce cost • Using the leading portfolio of Software-defined storage Optimized Economics • Balance performance and cost with system choices Proven Reference Architecture • Higher performance, more confidence, lower costs Industry Standard Approach • Deliver consistency and efficiencies Uses Technology advances • GPU, Open Source Frameworks Headwinds Challenge time-to-value Lower CAPEX Improve Model Quality Faster Time to Insight Business Agility Lower OPEX Higher Client Experience Automation Savings Look for dynamically adaptable, simple, flexible, secure, cost-efficient, and elastic infrastructure that can support high capacity along with high throughput and low latency for high performance training and inferencing experience. IDC
  • 12. IBMStorageandSDI © Copyright IBM Corporation 2018 The Goal: Move Data from Ingest to Insights INSIGHTSCLASSIFY / TRANSFORM ANALYZE / TRAININGESTEDGE
  • 13. IBMStorageandSDI © Copyright IBM Corporation 2018 Trained Model SSD/NVMe ML / DL Prep Training Inference IBM AI Data Pipeline Throughput-oriented, software defined temporary landing zone High throughput performance tier Transient Storage Global Ingest Fast Ingest / Real-time Analytics Archive Classification & Metadata Tagging SSD SDS/Cloud Cloud Hybrid/HDD INSIGHTSANALYZE / TRAININGEST Insights Out High scalability, large/sequential I/O capacity tier EDGE CLASSIFY / TRANSFORM TapeHDD Cloud High volume, index & auto-tagging zone Throughput-oriented, performance & capacity tier Throughput-oriented, globally accessible capacity tier High throughput, low latency, random I/O performance tier ETL Data In High throughput, random I/O, performance & capacity Tier Hadoop / Spark Data Lakes SSD/Hybrid Inference
  • 14. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM AI Data Pipeline with IBM Spectrum Storage Improved data governance with storage offerings for end-to-end data pipeline Spectrum Scale Cloud Object Storage Cloud Object Storage Elastic Storage Server Elastic Storage Server Elastic Storage Server Transient Storage Global Ingest Fast Ingest / Real-time Analytics Archive Spectrum Archive Hadoop / Spark Data Lakes Data In Insights Out INSIGHTSANALYZE / TRAININGESTEDGE CLASSIFY / TRANSFORM SSD SDS/Cloud Cloud SSD/Hybrid Hybrid/HDD TapeHDD Cloud Trained Model SSD/NVMe ML / DL Prep Training Inference Spectrum Discover Elastic Storage Server Cloud Object Storage Elastic Storage Server ETL Classification & Metadata Tagging Inference
  • 15. IBM Storage and SDI © Copyright IBM Corporation 2018 IBM Spectrum Storage Offerings for Analytics and AI
  • 16. IBMStorageandSDI © Copyright IBM Corporation 2018 Delivers Data Management at scale for enterprises that are swamped by data IBM Spectrum Scale Lets you grow and share the storage infrastructure while automatically moving file and object data to the optimal storage tier as quickly as possible. IBM Spectrum Scale Store Everywhere. Run Anywhere.
  • 17. © 2018 IBM Corporation© Copyright IBM Corporation 2018 IBM Spectrum Scale – Data Management at Scale Spectrum Scale Encryption and Compression NFS SMBFile ObjectHDFS Distributed RAID • Software defined file storage with high performance and extreme scalability • 50% of systems delivering top Spec-SFS benchmarks run IBM Spectrum Scale SW. • Supports file systems with sizes of tens of petabytes that contain billions of files and can be accessed by thousands of nodes in a cluster. • Smart policy engine to optimize utilization with multiple storage tiers Flash->Disk->Cloud->Tape • Enterprise class storage features like Disaster recovery, Encryption, Compression, Erasure Coding • Flexibility in storage architectures shared-nothing, shared-storage or hybrid. Fast Disk Slow Disk TapeSSD Fast Disk Slow Disk IBM Spectrum Scale – Data Management at Scale
  • 18. © 2018 IBM Corporation18 IBM Spectrum Scale Proven at over 4,000 customers worldwide Most common use- cases: - High performance computing - Big data workloads like Hadoop, Spark - Enterprise analytics workloads like SAS grid, SAP HANA - AI/ML/DL like genomics, autonomous driving - High performance active archive stores 4 time Champion Infiniti Red Bull Racing does real-time race analytics Personalized cancer treatment for over 65,000 patients Climate and weather modeling with 16 PB on line & 12 PB archive on tape R&D environment for natural language tools Semiconductor Design Higher profits from shorter chip design cycles Shared storage for global banking 100 times faster than incumbent solution
  • 19. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Spectrum Scale Storage …for the world’s most powerful supercomputers Summit System • 4608 nodes, each with: • 2 IBM Power9 processors • 6 Nvidia Tesla V100 GPUs • 608 GB of fast memory • 1.6 TB of NVMe memory • 200 petaflops peak performance for modeling and simulation • 3.3 ExaOps peak performance for data analytics and AI IBM Spectrum Scale IBM Elastic Storage Server 2.5 TB/sec throughput to storage architecture 250 PB HDD storage capacity Sierra System • 4320 nodes, each with • 2 IBM Power9 processors • 4 Nvidia V100 GPUs • 320 GB of node memory • 1.6 TB of NVMe memory • IBM Spectrum Scale • IBM Elastic Storage Server 125 petaflops peak performance 154 PB HDD storage capacity World’s most powerful supercomputer World #2 supercomputer
  • 20. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Elastic Storage Server (ESS) Integrated scale-out data management for file and object data Optimal building block for high-performance, scalable, reliable enterprise Spectrum Scale storage • Faster data access with choice to scale-up or out • Easy to deploy clusters with unified system GUI • Simplified storage administration with IBM Spectrum Control integration One solution for all your Spectrum Scale data needs • Single repository of data with unified file and object support • Anywhere access with multi-protocol support: NFS 4.0, SMB, OpenStack Swift, Cinder, and Manila • Ideal for Big Data Analytics with full Hadoop transparency Ready for business critical data • Disaster recovery with synchronous or asynchronous replication • Ensure reliability and fast rebuild times using Spectrum Scale RAID’s dispersed data and erasure code • Five 99999s of availability ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage
  • 21. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Elastic Storage Server (ESS) Family Model GL4S: 4 Enclosures, 20U 334 NL-SAS, 2 SSD Model GL6S: 6 Enclosures, 28U 502 NL-SAS, 2 SSD Model GL2S: 2 Enclosures, 12U 166 NL-SAS, 2 SSD Capacity ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage 36 GB/s12 GB/s 24 GB/s System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EXP3524 8 9 16 17 Model GS1S 24 SSD EXP3524 8 9 16 17 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EXP3524 8 9 16 17 Model GS2S 48 SSD EXP3524 8 9 16 17 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EXP3524 8 9 16 17 EXP3524 8 9 16 17 EXP3524 8 9 16 17 Model GS4S 96 SSD Speed 40 GB/s 14 GB/s Model GL1Sz: 1 Enclosures, 9U 82 NL-SAS, 2 SSD ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage 38 GB/s 40 GB/s Model GH14S: 1 2U24 Enclosure SSD 4 5U84 Enclosure HDD 334 NL-SAS, 24 SSD Model GH24S: 2 2U24 Enclosure SSD 4 5U84 Enclosure HDD 334 NL-SAS, 48 SSD 6 GB/s
  • 22. IBMStorageandSDI © Copyright IBM Corporation 2018 Consolidate capacity storage for a cognitive and AI enterprise NAS Services File sync & share Archive Data Backup & Cloud Backup Cloud Repository/Service IoT Repository Mobile Apps Access multiple distributed applications concurrently One or more sites with geo-dispersed data DVR & Video Repository Image/Voice Repository AnalyticsFile Archive Financial Compliance Healthcare Cardiology, Radiology PACS Research & Patient Data Cloud Native Apps Media Production/ Archive / Distribution Compliance & Retention Backup, Archive and File Services Data Oceans and Repositories Industry Specific Data New Cloud Applications Documents Fast data discoveryEfficient data analysis 22Page Actions based on dataData tagging
  • 23. IBMStorageandSDI © Copyright IBM Corporation 2018 The Market reinforces IBM transformational story Gartner Critical Capabilities for Object Storage #1 Analytics #1 Archiving #1 Backup #1 Cloud Storage * Source: Gartner Critical Capabilities for Object Storage Published 30 January 2019 - ID G00352191 Gartner MQ and IDC MarketScape IBM worldwide object-based leadership Gartner: MQ IDC MarketScape CRN Tech Innovator Tech Target LEADER LEADER WINNER FINALIST Distributed File Systems and MarketScape for Object Storage Storage – Cloud Product of the Year Object Storage Software Defined Storage October 2018 June 2018 December 2018 January 2019 3 years in a row 5 years in a row First Year First Year January 2019
  • 24. IBMStorageandSDI © Copyright IBM Corporation 2018 Transformational Insight for AI, Analytics, Governance, & Optimization – Expedite time to discovery • Automate cataloging of data by capturing metadata as it’s created • Locate and identify the most relevant data regardless of its type or location • Use simple SQL query commands using GUI interface or API scripts • Enable comprehensive insight by combining system metadata with custom tags to increase storage admin & data consumer productivity • Create custom tags, and policy-based workflows to orchestrate content inspection & activate data in AI, ML, & analytics workflows Scanning and Event Notifications
  • 25. IBM Storage and SDI © Copyright IBM Corporation 2018 Data Unification with IBM Spectrum Scale and HDP
  • 26. IBMStorageandSDI © Copyright IBM Corporation 2018 Reduce datacenter footprint and get faster ingest with in-place analytics Data NFS SMB POSIX Object HDFS API Access to the data using any of the industry standard protocols. No need to maintain separate copies for different applications. Flexible storage architectures Flexibility in architectures with the support of hybrid architecture under common namespace. Support for running containerized workloads. Extreme scalability with parallel file system architecture Data + Metadata Node Data + Metadata Node Data + Metadata Node Data + Metadata Node Scale to billions of files. No centralized metadata node bottleneck. ESS Why IBM Spectrum Scale for Analytics/AI workloads? Unmatched Scalability and Performance with the most optimized storage footprint Full Data Life Cycle Management Flash Disk Storage rich servers Storage pool1 Storage pool2 Storage poolx External Storage poolx Tape IBM TSM/LTFS Spectrum Scale Storage pool1 Storage pool2 Storage poolx External Storage poolx Data Migration between various storage pools with policy based Auto Tiering Install SW directly on compute nodes Shared storageOR Performance leadership in AI benchmarks 40GB/s and 300TB in 2U, Linear scaling of 120GB/s in 6U
  • 27. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Spectrum Scale + Hortonworks HDP • Spectrum Scale becomes the storage layer in your HDP environment. • Spectrum Scale supports accessing data using HDFS API and hence is transparent to the applications using HDP. • Enterprise class storage for your Hadoop/Spark environment (Encryption, Compression, Tiering, DR…) IBM Spectrum ScaleHDFS – Scale Transparency Connector Hortonworks HDP with IBM Spectrum Scale IBM Redbook
  • 28. 28 IBM Spectrum Scale IBM ESS Shared-Storage Model vs Classic HDFS Shared-Nothing Cluster 10 GigE / 40 GigE HDP Storage-Rich Worker Nodes HDP HDP HDP Standard Shared-Nothing model on storage-rich servers - Inefficient, inflexible, and expensive - Expensive, wasteful, and with high OPEX to scale and manage compute and storage - Lacks enterprise features • Disaggregated “thin” worker nodes with fewer disks • No application-data disks in servers • Replaced with shared storage • No need for storage-only nodes • Avoidance cluster sprawl with high performance, flexibility, and enterprise features • All with HDFS compatibility
  • 29. IBM Storage and SDI © Copyright IBM Corporation 2018 Data Unification with IBM Spectrum Scale Use Cases
  • 30. EDW Optimization Simplify data management using common storage between EDW and Hadoop Archive Data away from EDW - Move cold or rarely used data to Hadoop as active archive - Store more of data longer Offload costly ETL process - Free your EDW to perform high-value functions like analytics & operations, not ETL - Use Hadoop for advanced ETL Optimize the value of your EDW - Use Hadoop to refine new data sources, such as web and machine data for new analytical context Reduce migration effort & skillset gap - Use existing investment in Oracle/DB2/Netezza skills - BigSQL allows you to migrate applications without major code rewrites and additional SQL development Control cluster sprawl - Grow storage independent of compute with ESS - POWER servers deliver 1.7x throughput compared to Hortonworks on x86 - Up-to 60% less storage footprint Enterprise Data Warehouse DB2 / Dashdb / Oracle / Netezza / Teradata … Hot Data Hortonworks Hadoop Cold Data, Archive Data, New Sources BigSQL SQL Interface BI Software (Business Analytics, Visualization like SAS grid, SAP HANA etc) ESS for Speed ESS for Data Lake Spectrum Scale A Financial Services company in Europe is optimizing their DB2 warehouse using Hortonworks Hadoop; and is using ESS as the common storage behind DB2 and Hadoop. New Data Sources Streaming / IOT data
  • 31. © 2018 IBM Corporation Large banking group selects scalable data science platform to develop new smart banking services through use of AI in real-time Business problem • Needed to improve client experience and create new client services by identifying new patterns in its data through use of data science and AI techniques • Existing Hadoop infrastructure solution did not have sufficient throughput and scalability Solution • POWER9 cluster with L922 servers (x96) and AC922 servers (x3) • IBM Elastic Storage Server (ESS) with Spectrum Scale: GL1S (x2) and GL2S (x2) • Hortonworks Data Platform (HDP) and IBM Watson Studio (formerly DSX) Benefits • Open, virtualized infrastructure solution based on IBM Power Servers running HDP and Watson Studio • Optimized, scalable and highly available Storage Architecture with IBM Spectrum Scale based ESS • Integrated security of DSX+HDP in conjunction with higher throughput of POWER9 servers outperformed Intel and reduced time to value • End-to-end solution that addressed all requirements around performance, security, costs, and ability to scale New Smart AI ServicesNew AI-Driven Client Services in Banking IBM Spectrum Scale
  • 32. Unified Analytics Workflows Single data lake for Hadoop and non-Hadoop analytics A bank in South Africa is implementing HDP and SAS grid software on a common ESS based infrastructure. ESS for Data Lake POSIX Interface HDFS Interface Other Analytics Platforms SAS grid, SAP HANA/Vora, ML/DL, Conductor with Spark etc Hadoop Map-Reduce, Spark, ML/DL etc ESS for Speed Fast Ingest POSIX Interface Spectrum Scale All analytics workflows on common storage - Improve data reliability and governance with single data lake for Hadoop and non-Hadoop analytics setups - Build ML/DL workflows that use multiple analytics platforms - Share data across analytics workflows as appropriate Ingest fast and improve time to insight - POSIX interface combined with ESS Flash storage gives super fast ingest ability Control cluster sprawl - Grow storage independent of compute with ESS - Up-to 60% less storage footprint - POWER servers deliver 1.7x throughput compared to Hortonworks on x86
  • 33. © 2018 IBM Corporation Large bank delivers personalized banking in real- time to millions of customers by applying new analytics and data science. Business problem • Aggressively improve their analytics maturity by delivering Predictive Analytics capability providing a Data-driven Customer Experience • Develop open platform that can ingest all relevant data from various sources with the ability to extract new insights Solution • POWER8 cluster with S822L servers (x24) • IBM Elastic Storage Server (ESS) with Spectrum Scale: GL2S (x2) • Hortonworks Data Platform (HDP) Benefits • Open infrastructure solution based on IBM Power Servers running Linux and HDP • Optimized, scalable and highly available Storage Architecture with IBM Spectrum Scale based ESS • Better overall TCO: Superior performance with less than half the number of compute nodes where Power + ESS outperformed local storage on Intel • Leverage ESS in-place analytics to host both HDP and SAS workloads on single data layer reducing data copies and improving data governance Predictive Analytics Data-Driven Customer Banking IBM Spectrum Scale
  • 34. Integrated HPC and Hadoop Efficiently transform data into insights with single data lake for HPC & Hadoop NASA and a Healthcare company from middle east are using common Spectrum Scale data lake to efficiently get insights using traditional HPC and Hadoop analytics. ESS for Data Lake POSIX Interface HDFS Interface Traditional HPC Open, Read, Write, MPI, C-code, Python etc Hadoop Map-Reduce, Spark, ML/DL etc NFS/SMB/Object Interface Spectrum Scale Protocol Node ESS for Speed Fast Ingest POSIX Interface Spectrum Scale Extend HPC to add modern analytics capabilities - Efficient movement of data between modern and traditional applications with common namespace - Spectrum Scale in-place analytics capabilities enable accessing the same data using NFS/SMB/Object/POSIX/HDFS without requiring any modifications to the data - Improve data reliability and governance with single data lake Ingest fast and improve time to insight - POSIX interface combined with ESS Flash storage gives super fast ingest ability - Common namespace enables running some edge analytics at the ingest layer as well Control cluster sprawl - Grow storage independent of compute with ESS - Up-to 60% less storage footprint - POWER servers deliver 1.7x throughput compared to Hortonworks on x86
  • 35. IBMStorageandSDI © Copyright IBM Corporation 2018 Solutions – IBM Spectrum Storage for AI Available Solutions: § IBM Spectrum Storage for AI with Power Systems § IBM Spectrum Storage for AI with NVIDIA DGX (leading AI x86 based solution) § IBM Spectrum Storage for Hadoop/Spark workloads (Hortonworks/Cloudera) § IBM Spectrum Storage for AI in Autonomous Driving 35 IBM Spectrum Storage for AI supercharges your AI data pipeline with storage solutions optimized for the unique demands of AI. Integrating industry-leading servers, ISV / open source software and IBM software-defined storage, IBM Spectrum Storage for AI delivers simplified deployment, groundbreaking performance, and extended data management to drive developer productivity with the fastest path to insights. https://www.ibm.com/it-infrastructure/storage/ai-infrastructure
  • 36. © IBM Corporation 2019 36 “IBM’s Spectrum Storage for AI is differentiated from both the NetApp and Pure Storage offerings. IBM Spectrum Storage for AI provides a level of scalability that is nearly unmatched by anyone in the industry. It’s both incredibly fast at scale, and it scales linearly. The ability for IBM Spectrum Storage for AI to seamlessly integrate with the rest of the Spectrum Storage suite should make IBM’s solution an easy decision for enterprise buyers.” § Steve McDowell
  • 37. IBM Storage and SDI © Copyright IBM Corporation 2018 Questions?
  • 38. IBM Storage and SDI © Copyright IBM Corporation 2018 Thank You!