Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.
3. Convergence of Technology Disrupters
Create Opportunity
NetApp Confidential - Internal Use Only
Cloud
SocialMobile
Internet of
Things
Big Data
4. Traditional Structured and
Replicated Data mix shift is
driven by:
− Efficiency (Dedup,
Compr, Thin Prov, SATA)
− Growth in new category
of storage consumers
using cloud / content
depots
Unstructured Data (files
and objects) in traditional
storage + Content depots /
Cloud) will be the largest
storage category by 2014
− Content depots / Cloud
expected to be 95%
unstructured data
Revenue Share by Segment
Traditional structured
Traditional replicated
Traditional unstructured
Content depots / public cloud
Unstructured Data Growth Dominates
5. Not Even to The “Peak”
Estimated size of the
digital universe in 2020
40 Zettabytes 5 Billion
Smart phones
30 Billion
Pieces of new content to
Facebook per month
5
Technology Trigger
Peak of Inflated Expectations
Trough of Disillusionment
Slope of Enlightenment
Plateau of Productivity
VISIBILITY
TIME
80%
Unstructured
data
6. Big Data Is All Data From Everywhere
Transactional Data
Machine Data
Social Data
Enterprise Content
Fundamentally changes your business
The Jet way
The Call Center
7. Big Data Vendor Landscape
A Lot of Hype and Buzz – Everyone is Jumping In
Market is expected to grow from $3.2 billion
in 2010 to $16.9 billion in 2015
NoSQL $2Bn PA by 2015
Most firms are taking a pragmatic approach
Big data is in the very early stages of maturity
Best practices are not mature
IDC Big Data Survey
7
Nov-11
400
350
300
250
200
150
100
50
0
Jan-08
Cloudera series B
MapR series A
Cloudera series C
10gen series D
MapR series B
DataStax series B
Neo Technology series A
Opera Solutions series A
Platfora series A
Couchbase series C
Cloudera series D
Funding for Hadoop and NoSQL
"The Big Data market is expanding rapidly …
For technology buyers, opportunities exist to
use Big Data technology to improve
operational efficiency and to drive innovation.
Use cases are already present across
industries and geographic regions."
Dan Vesset, Vice President, IDC
451 Research
8. Data Growth Impact on Business
8
Complexity
VolumeSpeed
BusinessVelocity
Inflection
Point
Information Becomes
a Propellant to Business
Data Becomes a
Burden to IT Infrastructure
2010 2020
“Big Data” refers to datasets whose size is
beyond the ability of typical tools to capture,
store, manage and analyze
9. Why Should You Care?
It’s the Value of Your Data
Top line revenue
– Leverage their data
assets into business
advantage
Bottom Line savings
– Lower the cost of
compliance
– Manage ever growing
data efficiently
Over 1PB of data
Growth of 175% YOY
90 days of data within
24 hours of a failure
5 Billion Records
Anywhere, Anytime
Faster time to market
50% Increase in Revenue
9
11. Why NetApp?
Practical solutions that solve today’s problems
Get
Control
NetApp helps you turn your
exploding data from threat to
opportunity. Manage your data
effectively and affordably.
Break
Through
Break through the limits. With
NetApp, you can take on even the
most massive and complex data
projects.
Gain
Insight
Turn insight to action. NetApp helps
you get to clarity and insight faster
and more reliably.
11
12. Experience Managing Data at Scale
12
100 Customers
50 Customers
10 Customers
4 Customers
100 PB
50 PB
20 PB
10 PB
NetApp’s Largest Customer
13. NetApp Big Data Strategy
Best of breed storage for Big
Data Applications
Create deep integration and
value add
Build on open standards with
best-in-class partnerships
Validate with Ecosystem
Leaders
– Complete server, network and
storage “Racks”
– Delivered via trusted high-value
partners
13
Open
Best-of-Breed
Choice
14. Industry-Leading Storage Innovation
14
Flash Arrays
for ultra-high performance
E-Series
for price-performance at scale
StorageGRID
for web scale object storage
Clustered Data ONTAP
for Shared Infrastructure
Corporate
Data Centers
Cloud
Data Centers
15. Big Content
Retain forever, multi-site distribution
Big Bandwidth
Ingest, Process, Stream
Big Analytics
Reduce, Analyze, Report
Cloud
Private/Public
Retain, Distribute
Big Data Building Blocks
Applications
Extract
Retain, Distribute
Store
Retrieve
15
17. Analytics Oriented Business Processing
RDBMS
General Purpose DB
Data organized to
align with schemas
Fixed consistency
model
Complex queries
supported
Volume based data
management
Columnar DB
Analytics Oriented
Data organized in
column files
Tabular interface
without rigid schemas
Fast column scans
Multiple consistency
models
Transaction granular
data management
Document Store
Transaction Oriented
Data organized in
data structures in
memory
Schemaless
transaction store for
structured data
High transactional
performance
K-V Store
Metadata Service
Oriented
Data organized in key
value pairs
Suitable for metadata
services with CMS’
Associated with
object services
Transaction Processing
Realtime Analytics
Business Applications
Memory Ingest
Disk/Flash Tier
Query-based
Retrieval
Commit
Federated Database Store
(Build/Buy/Partner)
Persisted
Commit
Transaction granular data
resilience, recoverability &
protection at line speeds
Data organization
optimized by query
interface
Performance
optimized query
service
18. Analytics Technologies to look out for!
Columnar
DBs
(Analytics
Oriented)
Document
Stores
(Transaction
Oriented)
Key-Value
Stores
(Content/Object
Service)
Graph
DBs
(Niche)
Relational DBs
Row-oriented
RDBMS’
Datacenter Multi - Datacenter
• ACID constrained
• Complete query set
• Limited availability
• High consistency
• Rich query set
• Good availability
• Tuneable consistency
• Limited query set
• Highest/WAN availability
Old World New World
19. Analytics & Enterprise Apps Environment
19
Sensors
Applications
Logs
Location/GPS
Mobile Devices
Storage
(All other storage, i.e. internal DAS)
Content
Repositories
Shared Storage
Infrastructure
Storage File Systems
Data Management
Analytics
Applications
Reporting/Dashboard/Visualization
ETL
OLAP
OLTP
Other
Data
Sources
OLAPETL
Storage
Data
Management
NFS/sNFS/pNFS
NetApp Confidential – Limited Use
20. Some problems require an Enterprise Class
Hadoop solution
20
Enterprise Class Hadoop
Packaged ready-to-deploy modular Hadoop
cluster
The data has intrinsic value $$$
Capacity and compute requirements
expanding very fast
Higher storage performance
Real human consequences if the system
fails (Threats, treatments, financial losses)
System has to allow for asymmetric growth
Commodity, Off the Shelf Hadoop
Values associated with early adopters of
Hadoop
Social Media Space
Contributors to Apache
Strong bias to JBOD
Skeptical of ALL vendors
Enterprise Class Hadoop
Packaged ready-to-deploy modular compute
intensive Hadoop cluster
Compute intensive applications
Video, imaging analysis
Extremely tight Service Level expectations
Severe financial consequences if the
data analytic application or service is
run late
Enterprise Class Hadoop
Packaged ready-to-deploy modular storage
intensive Hadoop cluster
Storage intensive applications
Additional CPUs does not help run time
Financial ticker data analysis
Extremely tight Service Level expectations
Need deeper storage per datanode
ComputePower
Storage Capacity
NetApp Confidential – Limited Use
21. 21
NetApp Open Solution for Hadoop
Easy to Deploy, Manage and Scale
Uses High Performance storage
– Resilient and Compact
– RAID Protection of Data
– Less Network Congestion
Raw Capacity and density
– 120TB or 180TB in 4U
– Fully serviceable storage system
Reliability
– Hardware RAID & hot swap prevent
job restart due to node go off-line in
case of media failure
– Reliable metadata (Name Node)
Enterprise Class Hadoop
Map
Reduce
NameNode
DataNodes /
TaskTracker
DataNodes /
TaskTracker
:
HDFS
Secondary
NameNode
4 separate shared
nothing partitions
NetApp Confidential – Limited Use
JobTracker
FAS2040
E2660
22. NetApp Open Solution for Hadoop
Validated Benefits for the Enterprise
Improved cluster performance by 62%
Completed jobs 200% faster under
drive failure
Delivered linear performance scalability
as nodes, data grew
Per-server capacity increase of 1.5x
The NetApp Open Solution for Hadoop improves capacity
and performance efficiency and recoverability compared to
a server-based DAS deployment.
- ESG, 2012
23. Optimizing Performance and Stay Healthy
23
Source: Garrett, Brian and Lockner, Julie, “NetApp Open Solution for Hadoop”, ESG Report,
May 2012, http://bit.ly/LyYG0t
Network Overhead Useful Work
Availability and
Resiliency
Burst Handling and
Queuing
Oversubscription
Ratio
Data Node Network
Speed
Network
Latency
Source: Cisco: http://bit.ly/yL54Ts
25. Case Study: ASUP NetApp Analytics
25
Gateways
• 800K ASUPs
every week
• 40% coming
over the
weekend
Extract
Transform
Load
Data
Warehouse Data Mart
Data Mart
ETL
• Data needs
to be
parsed
and loaded
in 15
minutes
Data Warehouse
• Only 5% of data goes into
the data warehouse, rest
unstructured, yet it’s growing
7-10 TB per month
• No easy way to access this
unstructured content
Reporting
• Numerous mining
requests are not
satisfied currently
• Huge untapped
potential of
valuable insight
Finally, the incoming load doubles every 16 months!
NetApp Proprietary - Limited Use Only
26. Case Study: NetApp Large-Scale Analytics
CHALLENGE
NETAPP
SOLUTION
BENEFITS
4 weeks to run a query
on
24 billion unstructured
records
10-node
Hadoop
Cluster
Time reduced from
4 weeks to 10.5
hours
Impossible to run a
query:
240 billion unstructured
records
Previously
impossible, now
achievable in just 18
hours
26NetApp Proprietary - Limited Use Only
27. Big Data System Integrators Solutions Built on NetApp®
Integrated Big Data Solutions and Expertise
Planning and implementation expertise for Big Data
Turn-key solution stacks and Big Data services
27
28. Next Steps - Team with the Experts
Strategic Assessment
– Business goals
– Data growth needs
– Use case discovery (partner
delivery)
Consult
– Solution architecture and design
(NetApp delivery)
Deploy
– Installation and implementation
(NetApp delivery)
– Solution implementation (partner
delivery)
28
Support options:
Global support available
from NetApp and partners