This document discusses Dell's solutions for big data and analytics workloads. It describes Dell's portfolio for unstructured analytics including storage, servers, and reference architectures. It also outlines Dell's vision for a unified streaming and batch analytics platform called Project Nautilus that would integrate Isilon storage with real-time stream processing.
1. 1
Internal Use - Confidential
DataWorks Summit
Shawn Smith – Big Data Specialist
shawn.smith@dell.com
Accelerating Big Data Insights
Internal Use - Confidential
2. Transforming The Business
We help organizations reinvent themselves and realize their digital future
Digital
Transformation
Security
Transformation
Workforce
Transformation
IT
Transformation
3. Internal Use - Confidential
Dell EMC - Internal Use - Confidential
BUSINESS TRANSFORMATION
Ready for Whatever Comes Next:
AI, Augmented Reality, Machine Learning . . .
Emerging Challenges
4. Internal Use - Confidential
What is Unstructured Data?
• 80% + of data created globally is for unstructured data
• File data is growing VERY fast. Most customers see 30%
to 50% unstructured growth year over year
• Dell EMC is #1 in Scale Out File & Object storage
according to IDC and Gartner because of SIMPLICITY!
• Simple – Single Volume
• Efficient – Best Storage Utilization
• Scale-Out – Scale and grow without pain
• NO MIGRATIONS!
80%
5. Internal Use - Confidential
Unstructured Data Requires
Unconstrained
Scale
Optimized TCO/ROI
Longevity
Flash to Cloud
Flexibility
Enterprise
Features
Massive
Performance
SIMPLICITY
At Any Scale
6. Fraud
Detection &
Risk Analytics
Trading / Tick
Data Analytics
IoT
Data Driven
Business
Transformation
Unstructured Analytics Use Cases
Customer 360
Analytics
Enabling enterprises to improve operational efficiencies
and monetize new revenue streams
7. Internal Use - Confidential
Organizations need to deliver analytics on more than
just their traditional structured data
Evolving spectrum of data analytics
Requires infrastructure that enables multiple applications and varied use cases
Predictive
Analytics
Business
Intelligence
Analytics of
Things
Cyber security
Analytics
Real-time
Analytics
Machine
Learning
8. Internal Use - Confidential
Enables analytics for ALL of your data
Dell EMC Unstructured Analytics Portfolio
Performance
Centric
Storage
Centric
Predictive
Analytics
Business
Intelligence
Analytics of
Things
Cyber security
Analytics
Real-time
Analytics
Machine
Learning
Archive
Centric
9. Internal Use - Confidential
Proven solutions for unstructured analytics
Dell EMC Unstructured Analytics Portfolio
Solution accelerators
Hadoop Ready Bundle
QuickStart for Hadoop
EDW Optimization Solutions
Hadoop Backup Solutions
SAS-Grid Solution with Isilon
Streaming Analytics Solutions
Splunk Ready System
10. Right Solution Configuration for the use case
High Performance
100% Compliance to Hadoop features
Ability to scale down at cost
Oneor
more
Storage scaling faster than compute
Enterprise Grade File Mgmt.
Consolidation of IT Workloads
Aggregate capacity > 100 TB
One or
more
DataCompute
Geo-distributed single namespace
40% to 60% less than public cloud
Compute Data
Compute + Data
Direct
Attached
Storage
SharedStorage
ENTERPRISE REQUIREMENTS CONFIGURATIONdrive
Performance-
centric
Storage-
centric
Archive-
centric
11. 11
Internal Use - Confidential
THE BEDROCK OF THE MODERN DATA CENTER
PowerEdge R740xd
High performance server
Performance and Scale
Expanded GPU & storage capacity
boost workload performance
Innovative Design
Up to 24 NVME with up
to 18 x 3.5” drives
Integrated Security
Cyber resilient architecture, security
is integrated into full server lifecycle
– from design to retirement
Intelligent automation
New OpenManage™ Enterprise
console delivers crystal clear
reporting & full lifecycle automation
11
12. Market Leader Hadoop
Shared Storage
Customers running
Analytics / Hadoop
PBs of Analytics / Hadoop
• World’s #1 Courier Company
• 3 of the largest telecommunications companies in the
Americas
• One of the largest online retailer
• Multiple leading financial institutions
WHO IS USING ISILON FOR ANALYTICS?
385
Isilon Analytics Momentum
21 Industry Verticals
13. 13
Internal Use - Confidential
Ethernet
Job Tracker Task Tracker DataNode 2nd NameNode
NameNode
Hadoop Architecture - Traditional
R (RHIPE) Mahout Hive HBasePIG
NameNode
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
14. 14
Internal Use - Confidential
Ethernet
R (RHIPE)
PIG
Mahout Hive HBase
Job Tracker Task Tracker DataNode
Compute Node Compute Node Compute Node
Compute NodeCompute Node Compute Node
NameNode
Hadoop Architecture with Isilon
name
node
name
node
name
node
name
node
datanode
15. 15
Internal Use - Confidential
ISILON DATA LAKE
DATA PROTECTION
DATA SECURITY
PERFORMANCE MANAGEMENT
DATA MANAGEMENT
16. 16
Internal Use - Confidential
HDFS
SMB, NFS,
HTTP, FTP,
HDFS
node
info
node
info
node
info
node
info
node
info
node
info
node
info
node
info
node
info
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
file
file
file
file
file
file
file
file
Node
reply
Node
reply
Node
reply
Node
replyNFS
NFS
SMB
SMB
name
node
name
node
name
node
name
node
name
node
name
node
name
node
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
datanodedatanode
Isilon
OneFS Compute
Data
1X
HOW IT LOOKS
Name node
Data
Compute
18. Internal Use - Confidential
Phased Approach to Hadoop Tiered Storage with Isilon
• Hadoop Cluster with DAS for interactive and batch queries
• Queriable “active archive” in Isilon / ECS configured as a separate Hadoop cluster
• Archival policy implemented using scripts executed manually
Phase 0: Archival
Cluster
• Hot data in Hadoop Cluster with DAS
• Cold data in Isilon configured as a HDFS Target
• Hive, map-reduce and Spark jobs can run across the 2 clusters
• URIs to indicate whether data is in DAS cluster or Isilon Cluster
• Tiering policy implemented using scripts executed manually
Phase 1: Tiering with
Location Aware queries
Same as Phase 1, with additional capability :
• Data location handled transparently for Hive, map-reduce and Spark jobs : URIs don’t
need to indicate whether data is in DAS cluster or Isilon Cluster
Phase 2: Tiering with
Location transparent
queries
Same as Phase 2, with additional capability :
• Tiering policy implemented using automated data movement mechanisms.
Phase 3: Automated
tiering
19. 19
Internal Use - Confidential
It is an ecosystem where sensors, devices and equipment are connected to a
network and can transmit and receive data for tracking, analysis and action.
Operational
Technology
Industrial automation
Fleet telematics
Material handling
Information
Technology
Assets
Inventory
People
IoT
It’s not new and
not new to Dell.
It is the integration and extension
of OT and IT technologies that have
been round for decades
What is the Internet of Things?
20. 20
Internal Use - Confidential
It’s a great big IoT world out there
Smart Connected Business – from gateways to informed decisions
Transport Connector
Private and public networks10’s of billions of connected things
Things Sensors
High-performance computer infrastructure
Application layer
SAP Hana
In-Memory database layer
Libraries
Manufacturing
Energy and Natural Resources
Transportation
Building & Industrial Automation
21. 21
Internal Use - Confidential
Multiple Partners and Blueprints for OT / IT
SAP HANA®Software AG Apama®
Dell Edge Gateway 5000
Structured
Data
Dell EMC Data Center
Real-Time
Data
Unstructured
Data
Kepware KEPServerEX®
VisualizationsStream Analytics Machine LearningReportingAnalyticsProtocol Translation
0 0 1 0 1 1
1 0 0 1 1 0
23. Internal Use - Confidential
Project “Nautilus”
Streaming Storage + Analytics EngineProject Nautilus
Turbocharge Isilon and
ECS for Streaming
Batch Storage tier
Streaming IoT data
24. Today’s IoT Analytics “Accidental Architecture”
Batch
Real-Time
Interactive exploration
by Data Scientists
Real-time intelligence at
the NOC
Sensors
MirrorMaker
DR Site
Mobile Devices
App Logs
Producers
Surface /
Act
25. Internal Use - Confidential
Project Nautilus: A Unified Data Pipeline
Strongly Consistent Storage Exactly Once Processing Unified Analytics
Unified Analytics
Real-Time, Batch, Interactive
Sensors
Mobile Devices
App Logs Isilon / ECS
Ingest Buffer Pub/Sub Search Persistent Data
Structures
Pravega Streams
Unified Storage
Real-time intelligence at
the NOC
Interactive exploration
by Data Scientists
Surface /
Act
Producers
26. Internal Use - Confidential
Project Nautilus: A Unified Data Pipeline
Strongly Consistent Storage Exactly Once Processing Unified Analytics
Unified Analytics
Real-Time, Batch, Interactive
Sensors
Mobile Devices
App Logs
Isilon / ECS
Ingest Pub/Sub Search S3
Pravega Streams
Unified Storage
Real-time intelligence at
the NOC
Interactive exploration
by Data Scientists
Surface /
Act
Producers
HDFS NFS SMB