SlideShare a Scribd company logo
1 of 29
Download to read offline
The Big Data
Landscape
Entering a New Era of Scale
2
Convergence of Technology Disrupters
Create Opportunity
NetApp Confidential - Internal Use Only
Cloud
SocialMobile
Internet of
Things
Big Data
 Traditional Structured and
Replicated Data mix shift is
driven by:
− Efficiency (Dedup,
Compr, Thin Prov, SATA)
− Growth in new category
of storage consumers
using cloud / content
depots
 Unstructured Data (files
and objects) in traditional
storage + Content depots /
Cloud) will be the largest
storage category by 2014
− Content depots / Cloud
expected to be 95%
unstructured data
Revenue Share by Segment
Traditional structured
Traditional replicated
Traditional unstructured
Content depots / public cloud
Unstructured Data Growth Dominates
Not Even to The “Peak”
Estimated size of the
digital universe in 2020
40 Zettabytes 5 Billion
Smart phones
30 Billion
Pieces of new content to
Facebook per month
5
Technology Trigger
Peak of Inflated Expectations
Trough of Disillusionment
Slope of Enlightenment
Plateau of Productivity
VISIBILITY
TIME
80%
Unstructured
data
Big Data Is All Data From Everywhere
 Transactional Data
 Machine Data
 Social Data
 Enterprise Content
Fundamentally changes your business
The Jet way
The Call Center
Big Data Vendor Landscape
A Lot of Hype and Buzz – Everyone is Jumping In
 Market is expected to grow from $3.2 billion
in 2010 to $16.9 billion in 2015
 NoSQL $2Bn PA by 2015
 Most firms are taking a pragmatic approach
 Big data is in the very early stages of maturity
 Best practices are not mature
IDC Big Data Survey
7
Nov-11
400
350
300
250
200
150
100
50
0
Jan-08
Cloudera series B
MapR series A
Cloudera series C
10gen series D
MapR series B
DataStax series B
Neo Technology series A
Opera Solutions series A
Platfora series A
Couchbase series C
Cloudera series D
Funding for Hadoop and NoSQL
"The Big Data market is expanding rapidly …
For technology buyers, opportunities exist to
use Big Data technology to improve
operational efficiency and to drive innovation.
Use cases are already present across
industries and geographic regions."
Dan Vesset, Vice President, IDC
451 Research
Data Growth Impact on Business
8
Complexity
VolumeSpeed
BusinessVelocity
Inflection
Point
Information Becomes
a Propellant to Business
Data Becomes a
Burden to IT Infrastructure
2010 2020
“Big Data” refers to datasets whose size is
beyond the ability of typical tools to capture,
store, manage and analyze
Why Should You Care?
It’s the Value of Your Data
 Top line revenue
– Leverage their data
assets into business
advantage
 Bottom Line savings
– Lower the cost of
compliance
– Manage ever growing
data efficiently
 Over 1PB of data
 Growth of 175% YOY
 90 days of data within
 24 hours of a failure
 5 Billion Records
 Anywhere, Anytime
 Faster time to market
 50% Increase in Revenue
9
NetApp Big Data
Why NetApp?
Practical solutions that solve today’s problems
Get
Control
NetApp helps you turn your
exploding data from threat to
opportunity. Manage your data
effectively and affordably.
Break
Through
Break through the limits. With
NetApp, you can take on even the
most massive and complex data
projects.
Gain
Insight
Turn insight to action. NetApp helps
you get to clarity and insight faster
and more reliably.
11
Experience Managing Data at Scale
12
100 Customers
50 Customers
10 Customers
4 Customers
100 PB
50 PB
20 PB
10 PB
NetApp’s Largest Customer
NetApp Big Data Strategy
 Best of breed storage for Big
Data Applications
 Create deep integration and
value add
 Build on open standards with
best-in-class partnerships
 Validate with Ecosystem
Leaders
– Complete server, network and
storage “Racks”
– Delivered via trusted high-value
partners
13
Open
Best-of-Breed
Choice
Industry-Leading Storage Innovation
14
Flash Arrays
for ultra-high performance
E-Series
for price-performance at scale
StorageGRID
for web scale object storage
Clustered Data ONTAP
for Shared Infrastructure
Corporate
Data Centers
Cloud
Data Centers
Big Content
Retain forever, multi-site distribution
Big Bandwidth
Ingest, Process, Stream
Big Analytics
Reduce, Analyze, Report
Cloud
Private/Public
Retain, Distribute
Big Data Building Blocks
Applications
Extract
Retain, Distribute
Store
Retrieve
15
16
Analytics Oriented Business Processing
RDBMS
General Purpose DB
 Data organized to
align with schemas
 Fixed consistency
model
 Complex queries
supported
 Volume based data
management
Columnar DB
Analytics Oriented
 Data organized in
column files
 Tabular interface
without rigid schemas
 Fast column scans
 Multiple consistency
models
 Transaction granular
data management
Document Store
Transaction Oriented
 Data organized in
data structures in
memory
 Schemaless
transaction store for
structured data
 High transactional
performance
K-V Store
Metadata Service
Oriented
 Data organized in key
value pairs
 Suitable for metadata
services with CMS’
 Associated with
object services
Transaction Processing
Realtime Analytics
Business Applications
Memory Ingest
Disk/Flash Tier
Query-based
Retrieval
Commit
Federated Database Store
(Build/Buy/Partner)
Persisted
Commit
Transaction granular data
resilience, recoverability &
protection at line speeds
Data organization
optimized by query
interface
Performance
optimized query
service
Analytics Technologies to look out for!
Columnar
DBs
(Analytics
Oriented)
Document
Stores
(Transaction
Oriented)
Key-Value
Stores
(Content/Object
Service)
Graph
DBs
(Niche)
Relational DBs
Row-oriented
RDBMS’
Datacenter Multi - Datacenter
• ACID constrained
• Complete query set
• Limited availability
• High consistency
• Rich query set
• Good availability
• Tuneable consistency
• Limited query set
• Highest/WAN availability
Old World New World
Analytics & Enterprise Apps Environment
19
Sensors
Applications
Logs
Location/GPS
Mobile Devices
Storage
(All other storage, i.e. internal DAS)
Content
Repositories
Shared Storage
Infrastructure
Storage File Systems
Data Management
Analytics
Applications
Reporting/Dashboard/Visualization
ETL
OLAP
OLTP
Other
Data
Sources
OLAPETL
Storage
Data
Management
NFS/sNFS/pNFS
NetApp Confidential – Limited Use
Some problems require an Enterprise Class
Hadoop solution
20
Enterprise Class Hadoop
Packaged ready-to-deploy modular Hadoop
cluster
 The data has intrinsic value $$$
 Capacity and compute requirements
expanding very fast
 Higher storage performance
 Real human consequences if the system
fails (Threats, treatments, financial losses)
 System has to allow for asymmetric growth
Commodity, Off the Shelf Hadoop
Values associated with early adopters of
Hadoop
 Social Media Space
 Contributors to Apache
 Strong bias to JBOD
 Skeptical of ALL vendors
Enterprise Class Hadoop
Packaged ready-to-deploy modular compute
intensive Hadoop cluster
 Compute intensive applications
 Video, imaging analysis
 Extremely tight Service Level expectations
 Severe financial consequences if the
data analytic application or service is
run late
Enterprise Class Hadoop
Packaged ready-to-deploy modular storage
intensive Hadoop cluster
 Storage intensive applications
 Additional CPUs does not help run time
 Financial ticker data analysis
 Extremely tight Service Level expectations
 Need deeper storage per datanode
ComputePower
Storage Capacity
NetApp Confidential – Limited Use
21
NetApp Open Solution for Hadoop
 Easy to Deploy, Manage and Scale
 Uses High Performance storage
– Resilient and Compact
– RAID Protection of Data
– Less Network Congestion
 Raw Capacity and density
– 120TB or 180TB in 4U
– Fully serviceable storage system
 Reliability
– Hardware RAID & hot swap prevent
job restart due to node go off-line in
case of media failure
– Reliable metadata (Name Node)
Enterprise Class Hadoop
Map
Reduce
NameNode
DataNodes /
TaskTracker
DataNodes /
TaskTracker
:
HDFS
Secondary
NameNode
4 separate shared
nothing partitions
NetApp Confidential – Limited Use
JobTracker
FAS2040
E2660
NetApp Open Solution for Hadoop
Validated Benefits for the Enterprise
 Improved cluster performance by 62%
 Completed jobs 200% faster under
drive failure
 Delivered linear performance scalability
as nodes, data grew
 Per-server capacity increase of 1.5x
The NetApp Open Solution for Hadoop improves capacity
and performance efficiency and recoverability compared to
a server-based DAS deployment.
- ESG, 2012
Optimizing Performance and Stay Healthy
23
Source: Garrett, Brian and Lockner, Julie, “NetApp Open Solution for Hadoop”, ESG Report,
May 2012, http://bit.ly/LyYG0t
Network Overhead Useful Work
Availability and
Resiliency
Burst Handling and
Queuing
Oversubscription
Ratio
Data Node Network
Speed
Network
Latency
Source: Cisco: http://bit.ly/yL54Ts
DAS vs. NetApp footprint
DAS Option
 2RU, CPU: 2x8 cores, RAM: 48GB, Disk:
24 TB
 1 Rack(42RU): 20 servers (320 cores,
960GB, 480TB)
 6 Racks: 1920 cores, 5.7TB RAM, 2.8 PB
Storage (120 servers)
NetApp Option
 1RU, CPU: 2x8 cores, RAM: 48GB, Disk: 2
TB (8TB Max(Optional PIXI Boot Diskless)
 1 Rack (42RU)
 CPU and Memory: 24 servers(6:1),
384 cores, 1.152TB
 Storage: 4 E2660 720TB
 4 Racks: 1536 cores, 4.6TB, 2.8 PB (96
servers)
Case Study: ASUP NetApp Analytics
25
Gateways
• 800K ASUPs
every week
• 40% coming
over the
weekend
Extract
Transform
Load
Data
Warehouse Data Mart
Data Mart
ETL
• Data needs
to be
parsed
and loaded
in 15
minutes
Data Warehouse
• Only 5% of data goes into
the data warehouse, rest
unstructured, yet it’s growing
7-10 TB per month
• No easy way to access this
unstructured content
Reporting
• Numerous mining
requests are not
satisfied currently
• Huge untapped
potential of
valuable insight
Finally, the incoming load doubles every 16 months!
NetApp Proprietary - Limited Use Only
Case Study: NetApp Large-Scale Analytics
CHALLENGE
NETAPP
SOLUTION
BENEFITS
4 weeks to run a query
on
24 billion unstructured
records
10-node
Hadoop
Cluster
Time reduced from
4 weeks to 10.5
hours
Impossible to run a
query:
240 billion unstructured
records
Previously
impossible, now
achievable in just 18
hours
26NetApp Proprietary - Limited Use Only
Big Data System Integrators Solutions Built on NetApp®
Integrated Big Data Solutions and Expertise
 Planning and implementation expertise for Big Data
 Turn-key solution stacks and Big Data services
27
Next Steps - Team with the Experts
 Strategic Assessment
– Business goals
– Data growth needs
– Use case discovery (partner
delivery)
 Consult
– Solution architecture and design
(NetApp delivery)
 Deploy
– Installation and implementation
(NetApp delivery)
– Solution implementation (partner
delivery)
28
Support options:
Global support available
from NetApp and partners
NetApp Confidential - Internal Use Only

More Related Content

What's hot

What's hot (20)

Downsizing Data Centers by NetApp IT
Downsizing Data Centers by NetApp ITDownsizing Data Centers by NetApp IT
Downsizing Data Centers by NetApp IT
 
The Benefits of Data Fabric
The Benefits of Data FabricThe Benefits of Data Fabric
The Benefits of Data Fabric
 
Private Cloud Infrastructure
Private Cloud InfrastructurePrivate Cloud Infrastructure
Private Cloud Infrastructure
 
Balancing Performance, Capacity and Economics with Flash
Balancing Performance, Capacity and Economics with FlashBalancing Performance, Capacity and Economics with Flash
Balancing Performance, Capacity and Economics with Flash
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Hybrid Cloud: The Cloud on Your Terms
Hybrid Cloud: The Cloud on Your TermsHybrid Cloud: The Cloud on Your Terms
Hybrid Cloud: The Cloud on Your Terms
 
How Data Saves Time
How Data Saves TimeHow Data Saves Time
How Data Saves Time
 
Mastering Information Technology During Business Transformation
Mastering Information Technology During Business TransformationMastering Information Technology During Business Transformation
Mastering Information Technology During Business Transformation
 
2014 Predictions: Jay Kidd
2014 Predictions: Jay Kidd2014 Predictions: Jay Kidd
2014 Predictions: Jay Kidd
 
Starting the Journey to Managed Infrastructure Services
Starting the Journey to Managed Infrastructure ServicesStarting the Journey to Managed Infrastructure Services
Starting the Journey to Managed Infrastructure Services
 
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
 
How to accelerate Splunk analytics
How to accelerate Splunk analyticsHow to accelerate Splunk analytics
How to accelerate Splunk analytics
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinar
 
A Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data VirtualizationA Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data Virtualization
 
Postgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy SystemPostgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy System
 
KEYNOTE: Edge optimized architecture for fabric defect detection in real-time
KEYNOTE: Edge optimized architecture for fabric defect detection in real-timeKEYNOTE: Edge optimized architecture for fabric defect detection in real-time
KEYNOTE: Edge optimized architecture for fabric defect detection in real-time
 
Cloud for the Hybrid Data Center
Cloud for the Hybrid Data CenterCloud for the Hybrid Data Center
Cloud for the Hybrid Data Center
 
[Webinar] When It Comes To Cloud, Great Power Brings Great Responsibility
[Webinar] When It Comes To Cloud, Great Power Brings Great Responsibility[Webinar] When It Comes To Cloud, Great Power Brings Great Responsibility
[Webinar] When It Comes To Cloud, Great Power Brings Great Responsibility
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 

Similar to Exploring the Wider World of Big Data

Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
Jyrki Määttä
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
DataWorks Summit
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
 

Similar to Exploring the Wider World of Big Data (20)

Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 

More from NetApp

More from NetApp (20)

DevOps the NetApp Way: 10 Rules for Forming a DevOps Team
DevOps the NetApp Way: 10 Rules for Forming a DevOps TeamDevOps the NetApp Way: 10 Rules for Forming a DevOps Team
DevOps the NetApp Way: 10 Rules for Forming a DevOps Team
 
10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI
 
Spot Lets NetApp Get the Most Out of the Cloud
Spot Lets NetApp Get the Most Out of the CloudSpot Lets NetApp Get the Most Out of the Cloud
Spot Lets NetApp Get the Most Out of the Cloud
 
NetApp #WFH: COVID-19 Impact Report
NetApp #WFH: COVID-19 Impact ReportNetApp #WFH: COVID-19 Impact Report
NetApp #WFH: COVID-19 Impact Report
 
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
 
NetApp 2020 Predictions
NetApp 2020 Predictions NetApp 2020 Predictions
NetApp 2020 Predictions
 
NetApp 2020 Predictions
NetApp 2020 Predictions NetApp 2020 Predictions
NetApp 2020 Predictions
 
NetApp 2020 Predictions in Tech
NetApp 2020 Predictions in TechNetApp 2020 Predictions in Tech
NetApp 2020 Predictions in Tech
 
Corporate IT at NetApp
Corporate IT at NetAppCorporate IT at NetApp
Corporate IT at NetApp
 
Modernize small and mid-sized enterprise data management with the AFF C190
Modernize small and mid-sized enterprise data management with the AFF C190Modernize small and mid-sized enterprise data management with the AFF C190
Modernize small and mid-sized enterprise data management with the AFF C190
 
Achieving Target State Architecture in NetApp IT
Achieving Target State Architecture in NetApp ITAchieving Target State Architecture in NetApp IT
Achieving Target State Architecture in NetApp IT
 
10 Reasons Why Your SAP Applications Belong on NetApp
10 Reasons Why Your SAP Applications Belong on NetApp10 Reasons Why Your SAP Applications Belong on NetApp
10 Reasons Why Your SAP Applications Belong on NetApp
 
Turbocharge Your Data with Intel Optane Technology and MAX Data
Turbocharge Your Data with Intel Optane Technology and MAX DataTurbocharge Your Data with Intel Optane Technology and MAX Data
Turbocharge Your Data with Intel Optane Technology and MAX Data
 
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud InfrastructureRedefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
 
Webinar: NetApp SaaS Backup
Webinar: NetApp SaaS BackupWebinar: NetApp SaaS Backup
Webinar: NetApp SaaS Backup
 
NetApp 2019 Perspectives
NetApp 2019 PerspectivesNetApp 2019 Perspectives
NetApp 2019 Perspectives
 
Künstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
Künstliche Intelligenz ist in deutschen Unter- nehmen ChefsacheKünstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
Künstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
 
Iperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo ITIperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo IT
 
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
 
NetApp IT’s Tiered Archive Approach for Active IQ
NetApp IT’s Tiered Archive Approach for Active IQNetApp IT’s Tiered Archive Approach for Active IQ
NetApp IT’s Tiered Archive Approach for Active IQ
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Exploring the Wider World of Big Data

  • 2. Entering a New Era of Scale 2
  • 3. Convergence of Technology Disrupters Create Opportunity NetApp Confidential - Internal Use Only Cloud SocialMobile Internet of Things Big Data
  • 4.  Traditional Structured and Replicated Data mix shift is driven by: − Efficiency (Dedup, Compr, Thin Prov, SATA) − Growth in new category of storage consumers using cloud / content depots  Unstructured Data (files and objects) in traditional storage + Content depots / Cloud) will be the largest storage category by 2014 − Content depots / Cloud expected to be 95% unstructured data Revenue Share by Segment Traditional structured Traditional replicated Traditional unstructured Content depots / public cloud Unstructured Data Growth Dominates
  • 5. Not Even to The “Peak” Estimated size of the digital universe in 2020 40 Zettabytes 5 Billion Smart phones 30 Billion Pieces of new content to Facebook per month 5 Technology Trigger Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity VISIBILITY TIME 80% Unstructured data
  • 6. Big Data Is All Data From Everywhere  Transactional Data  Machine Data  Social Data  Enterprise Content Fundamentally changes your business The Jet way The Call Center
  • 7. Big Data Vendor Landscape A Lot of Hype and Buzz – Everyone is Jumping In  Market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015  NoSQL $2Bn PA by 2015  Most firms are taking a pragmatic approach  Big data is in the very early stages of maturity  Best practices are not mature IDC Big Data Survey 7 Nov-11 400 350 300 250 200 150 100 50 0 Jan-08 Cloudera series B MapR series A Cloudera series C 10gen series D MapR series B DataStax series B Neo Technology series A Opera Solutions series A Platfora series A Couchbase series C Cloudera series D Funding for Hadoop and NoSQL "The Big Data market is expanding rapidly … For technology buyers, opportunities exist to use Big Data technology to improve operational efficiency and to drive innovation. Use cases are already present across industries and geographic regions." Dan Vesset, Vice President, IDC 451 Research
  • 8. Data Growth Impact on Business 8 Complexity VolumeSpeed BusinessVelocity Inflection Point Information Becomes a Propellant to Business Data Becomes a Burden to IT Infrastructure 2010 2020 “Big Data” refers to datasets whose size is beyond the ability of typical tools to capture, store, manage and analyze
  • 9. Why Should You Care? It’s the Value of Your Data  Top line revenue – Leverage their data assets into business advantage  Bottom Line savings – Lower the cost of compliance – Manage ever growing data efficiently  Over 1PB of data  Growth of 175% YOY  90 days of data within  24 hours of a failure  5 Billion Records  Anywhere, Anytime  Faster time to market  50% Increase in Revenue 9
  • 11. Why NetApp? Practical solutions that solve today’s problems Get Control NetApp helps you turn your exploding data from threat to opportunity. Manage your data effectively and affordably. Break Through Break through the limits. With NetApp, you can take on even the most massive and complex data projects. Gain Insight Turn insight to action. NetApp helps you get to clarity and insight faster and more reliably. 11
  • 12. Experience Managing Data at Scale 12 100 Customers 50 Customers 10 Customers 4 Customers 100 PB 50 PB 20 PB 10 PB NetApp’s Largest Customer
  • 13. NetApp Big Data Strategy  Best of breed storage for Big Data Applications  Create deep integration and value add  Build on open standards with best-in-class partnerships  Validate with Ecosystem Leaders – Complete server, network and storage “Racks” – Delivered via trusted high-value partners 13 Open Best-of-Breed Choice
  • 14. Industry-Leading Storage Innovation 14 Flash Arrays for ultra-high performance E-Series for price-performance at scale StorageGRID for web scale object storage Clustered Data ONTAP for Shared Infrastructure Corporate Data Centers Cloud Data Centers
  • 15. Big Content Retain forever, multi-site distribution Big Bandwidth Ingest, Process, Stream Big Analytics Reduce, Analyze, Report Cloud Private/Public Retain, Distribute Big Data Building Blocks Applications Extract Retain, Distribute Store Retrieve 15
  • 16. 16
  • 17. Analytics Oriented Business Processing RDBMS General Purpose DB  Data organized to align with schemas  Fixed consistency model  Complex queries supported  Volume based data management Columnar DB Analytics Oriented  Data organized in column files  Tabular interface without rigid schemas  Fast column scans  Multiple consistency models  Transaction granular data management Document Store Transaction Oriented  Data organized in data structures in memory  Schemaless transaction store for structured data  High transactional performance K-V Store Metadata Service Oriented  Data organized in key value pairs  Suitable for metadata services with CMS’  Associated with object services Transaction Processing Realtime Analytics Business Applications Memory Ingest Disk/Flash Tier Query-based Retrieval Commit Federated Database Store (Build/Buy/Partner) Persisted Commit Transaction granular data resilience, recoverability & protection at line speeds Data organization optimized by query interface Performance optimized query service
  • 18. Analytics Technologies to look out for! Columnar DBs (Analytics Oriented) Document Stores (Transaction Oriented) Key-Value Stores (Content/Object Service) Graph DBs (Niche) Relational DBs Row-oriented RDBMS’ Datacenter Multi - Datacenter • ACID constrained • Complete query set • Limited availability • High consistency • Rich query set • Good availability • Tuneable consistency • Limited query set • Highest/WAN availability Old World New World
  • 19. Analytics & Enterprise Apps Environment 19 Sensors Applications Logs Location/GPS Mobile Devices Storage (All other storage, i.e. internal DAS) Content Repositories Shared Storage Infrastructure Storage File Systems Data Management Analytics Applications Reporting/Dashboard/Visualization ETL OLAP OLTP Other Data Sources OLAPETL Storage Data Management NFS/sNFS/pNFS NetApp Confidential – Limited Use
  • 20. Some problems require an Enterprise Class Hadoop solution 20 Enterprise Class Hadoop Packaged ready-to-deploy modular Hadoop cluster  The data has intrinsic value $$$  Capacity and compute requirements expanding very fast  Higher storage performance  Real human consequences if the system fails (Threats, treatments, financial losses)  System has to allow for asymmetric growth Commodity, Off the Shelf Hadoop Values associated with early adopters of Hadoop  Social Media Space  Contributors to Apache  Strong bias to JBOD  Skeptical of ALL vendors Enterprise Class Hadoop Packaged ready-to-deploy modular compute intensive Hadoop cluster  Compute intensive applications  Video, imaging analysis  Extremely tight Service Level expectations  Severe financial consequences if the data analytic application or service is run late Enterprise Class Hadoop Packaged ready-to-deploy modular storage intensive Hadoop cluster  Storage intensive applications  Additional CPUs does not help run time  Financial ticker data analysis  Extremely tight Service Level expectations  Need deeper storage per datanode ComputePower Storage Capacity NetApp Confidential – Limited Use
  • 21. 21 NetApp Open Solution for Hadoop  Easy to Deploy, Manage and Scale  Uses High Performance storage – Resilient and Compact – RAID Protection of Data – Less Network Congestion  Raw Capacity and density – 120TB or 180TB in 4U – Fully serviceable storage system  Reliability – Hardware RAID & hot swap prevent job restart due to node go off-line in case of media failure – Reliable metadata (Name Node) Enterprise Class Hadoop Map Reduce NameNode DataNodes / TaskTracker DataNodes / TaskTracker : HDFS Secondary NameNode 4 separate shared nothing partitions NetApp Confidential – Limited Use JobTracker FAS2040 E2660
  • 22. NetApp Open Solution for Hadoop Validated Benefits for the Enterprise  Improved cluster performance by 62%  Completed jobs 200% faster under drive failure  Delivered linear performance scalability as nodes, data grew  Per-server capacity increase of 1.5x The NetApp Open Solution for Hadoop improves capacity and performance efficiency and recoverability compared to a server-based DAS deployment. - ESG, 2012
  • 23. Optimizing Performance and Stay Healthy 23 Source: Garrett, Brian and Lockner, Julie, “NetApp Open Solution for Hadoop”, ESG Report, May 2012, http://bit.ly/LyYG0t Network Overhead Useful Work Availability and Resiliency Burst Handling and Queuing Oversubscription Ratio Data Node Network Speed Network Latency Source: Cisco: http://bit.ly/yL54Ts
  • 24. DAS vs. NetApp footprint DAS Option  2RU, CPU: 2x8 cores, RAM: 48GB, Disk: 24 TB  1 Rack(42RU): 20 servers (320 cores, 960GB, 480TB)  6 Racks: 1920 cores, 5.7TB RAM, 2.8 PB Storage (120 servers) NetApp Option  1RU, CPU: 2x8 cores, RAM: 48GB, Disk: 2 TB (8TB Max(Optional PIXI Boot Diskless)  1 Rack (42RU)  CPU and Memory: 24 servers(6:1), 384 cores, 1.152TB  Storage: 4 E2660 720TB  4 Racks: 1536 cores, 4.6TB, 2.8 PB (96 servers)
  • 25. Case Study: ASUP NetApp Analytics 25 Gateways • 800K ASUPs every week • 40% coming over the weekend Extract Transform Load Data Warehouse Data Mart Data Mart ETL • Data needs to be parsed and loaded in 15 minutes Data Warehouse • Only 5% of data goes into the data warehouse, rest unstructured, yet it’s growing 7-10 TB per month • No easy way to access this unstructured content Reporting • Numerous mining requests are not satisfied currently • Huge untapped potential of valuable insight Finally, the incoming load doubles every 16 months! NetApp Proprietary - Limited Use Only
  • 26. Case Study: NetApp Large-Scale Analytics CHALLENGE NETAPP SOLUTION BENEFITS 4 weeks to run a query on 24 billion unstructured records 10-node Hadoop Cluster Time reduced from 4 weeks to 10.5 hours Impossible to run a query: 240 billion unstructured records Previously impossible, now achievable in just 18 hours 26NetApp Proprietary - Limited Use Only
  • 27. Big Data System Integrators Solutions Built on NetApp® Integrated Big Data Solutions and Expertise  Planning and implementation expertise for Big Data  Turn-key solution stacks and Big Data services 27
  • 28. Next Steps - Team with the Experts  Strategic Assessment – Business goals – Data growth needs – Use case discovery (partner delivery)  Consult – Solution architecture and design (NetApp delivery)  Deploy – Installation and implementation (NetApp delivery) – Solution implementation (partner delivery) 28 Support options: Global support available from NetApp and partners
  • 29. NetApp Confidential - Internal Use Only