Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
EMC Isilon Database Converged deck
1. 1
Data & Analytics Convergence
Keith Manthey, CTO Analytics
Property of EMC. Not for further distribution
2. 2
Source: EMC Digital Universe with Research and Analysis by IDC, The Digital Universe of
Opportunities: Rich Data and the Increasing Value of the Internet of Things, April 2014.
2020
4.4ZETTABYTES
44ZETTABYTES
10xMORE
DigitalUniverse 2014
2013
ZETTABYTE = 1,000,000,000,000,000,000,000 bytes
34.4 Billion 32GB Smartphones =1 ZETTABYTE
34.4 Billion Samsung S5’s end-to-end would circle the Earth 121.8 times
Property of EMC. Not for further distribution
5. 5
PRECISION
FARMING
DRESS THAT
DISPLAYS HOW
WE FEEL
CONTACT LENS
THAT CONTROLS
BLOOD SUGAR
THERMOSTAT
THAT KNOWS
YOU’RE AWAY
FITNESS BAND
THAT MEASURES
ACTIVITY LEVEL
GLASSES THAT
DIRECT US
WHERE TO GO
DRONES THAT
DELIVER OUR
GROCERIES
DIGITIZATION IS ALREADY BEGINNING
Property of EMC. Not for further distribution
10. 10
Philosophical - Database
Cache Logs
System Processes
(including Logical - Catalog + Physical Structures –
Reader/Writers)
Data
Storage
Instance
Traditional DB
Assumes:
• Query < 5% of Data
• Schema on Write
(Structured)
• All data confirms to
Schema (changes to
versioned data if
schema changes)
• Limited to compute
methods (SQL, UDF,
and R soon*)
Property of EMC. Not for further distribution
11. 11
Philosophical - Hadoop
Spark MapReduce
HDFS
(including Logical – Name Node+ Physical Structures – Data
Node)
Data
Storage
YARN
Hadoop
Built for:
• Query 100% of Data
each time
• Schema on Read
(including multiple
versions over time)
• Unlimited in compute
methods (SQL,
Programmatic,
Tools(Spark, Storm,
R…))
Property of EMC. Not for further distribution
12. 12
Comparison
Spark MapReduce
HDFS
(including Logical – Name Node+ Physical Structures – Data
Node)
Data
Storage
YARN
Cache Logs
System Processes
(including Logical - Catalog + Physical Structures –
Reader/Writers)
Data
Storage
Instance
SCALE UP – More CPUS/Memory
Vs
SCALE OUT – More Nodes
SCALE OUT – More Nodes
Property of EMC. Not for further distribution
14. 14
What is the DB Convergence Play?
Per Microsoft, “PolyBase is
a T-SQL front end that
allows customers to query
data stored in HDFS”
Microsoft Polybase - Click here for original
IBM's Big SQL Product Overview
Property of EMC. Not for further distribution
15. 15
But… Hadoop is about DAS
Property of EMC. Not for further distribution
16. 16
Data Locality – Per Eric Brewer…
MSFT
Research Link
U. Cal
Berkeley
Original Link /
Paper
Property of EMC. Not for further distribution
17. 17
Who is Eric Brewer?
• Eric Brewer is a UC
Berkeley Professor who
happens to be currently on
sabbatical working with
Google (VP of
Infrastructure).
• He proposed the CAP
Theorem in 1990
• Google records 40K hits on
“Brewer’s Theorem Proofs”
Property of EMC. Not for further distribution
18. 18
It’s all Hadoop?
• Per Mike Olson at 2015 Strataconf, Hadoop is really
disappearing, with the real importance of discussion
on the applications on top of the platform
• It’s about Outcomes and use cases. As a result,
Machine Learning & Spark are gaining all the glory
– “How Old” Presentation from Strataconf
– IBM commits 3.5K associates to Apache Spark
– Microsoft buy Revolution Analytics to bring Machine
Learning to Databases
Property of EMC. Not for further distribution
19. 19
What has transpired with Hadoop?
• Cloudera has cracked into the Operational Data Store
and Data Warehouse Gartner Quads. This has long
been held by traditional RDBMS entrants.
• Increased investment from Hadoop vendors around
items like Kudu and LLAP targeting OLTP workloads.
• Creation of a converged ACID Compliant RDBMS on
Hadoop
Property of EMC. Not for further distribution
20. 20
Keith’s Predictions
• More Enterprise Patterns for Hadoop:
– Companies are running out of data center and network
space. The push for denser footprints are emerging
– Operations drives better reference architectures that match
their support model
– More focus on Interactive Queries and real time processing
– More converged pushes from other parties like Splice
– More use cases driving more adoption, but less about
Hadoop
• More Unstructured Data Support / Analytics for
Databases & ACID compliance upon Hadoop.
– To Quote Willie Sutton: “It’s where the money is…”
Property of EMC. Not for further distribution
21. 21
Why does EMC Care?
• Enterprise Standard Storage Technology supporting
the World’s Databases
• Largest Enterprise Storage Vendor for Hadoop
Platforms (Isilon)
– Certified with Hortonworks and Cloudera, along with Pivotal
and IBM Big Insights
• Bring ease of use to difficult platform and ease of
convergence on products like Polybase.
Property of EMC. Not for further distribution
24. 24
Traditional Hadoop POD
18 racks
Extended Time-to-Results
•Requires Additional “Data Staging” Storage
•Iterative Testing is Time Consuming
•Requires Copying of Data Several Times
Rigid Architecture
•Inefficient Floor Space
•Must Purchase Compute & Storage Together
•Storage Efficiency < 25%
Lacks Enterprise Features
•No Disaster Recovery, Snapshots
•Single Protocol (HDFS Only)
•Lacks Full Security Features
42U
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
=
~ 5PB Usable
Hadoop Storage
Isilon vHadoop
(no staging needed)
Hadoop POD:
Compute with Staging Storage
Isilon vHadoop
8 racks
Faster Time-to-Results
•Data Stays on the Isilon Cluster
•Allows for Rapid Iterative Testing Process
•Simplifies Hosting Workflow
Flexible Architecture
•Efficient Floor Space, Power & Cooling
•Leverage VMs for Flexible Deployments
•Storage Efficiency > 78%
Enterprise Capabilities
•Disaster Recovery, Snapshots
•SyncIQ-Data Replication Offsite
•Highly Secure Hosting Environment
42U
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
42U
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
42U
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
42U
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
SERIES
Property of EMC. Not for further distribution
Just to put the exponential growth of the digital universe into context…
Like the physical universe, the digital universe is large – doubling in size every two years, and by 2020 the digital universe – the data we create and copy annually – will reach 44 zettabytes, or 44 trillion gigabytes – containing nearly as many digital bits as there are stars in the universe.
If the Digital Universe were represented by the memory in a stack of tablets, in 2013 it would have stretched two-thirds of the way to the Moon. By 2020, there would be 6.6 stacks from the Earth to the Moon.
With this much data floating around, we need structure to sort it, make sense of it, and tell the story. That’s where data visualization comes in.
We live in an amazing time, but looking forward to 2020…
Estimated 30B – 200B devices
7 billion people
1 million new businesses from where we are today
These people using these devices within these businesses are constantly connected
This gives rise to new ways of doing business: new disruptive technology, new disruptive business models
<CLICK>
We’re already starting to see this today
Looking at the likes of Nest: a thermostat that knows when you are in and out of your house, and can regulate the temperature in your home much more efficiently than ever done in the past
Wearables such as Fitbits, Jawbones and the like
There’s sports clothing companies that come to us that say they think in 10 years they will be more of a software company, with clothing that contains embedded telemetric devices, that communicate not just who they are, and where they are, but what time they get up, when they eat, when they sweat. Sports companies will know almost everything about you, whereas in the past they’ve known almost nothing about you.
Another example: contact lenses that regulate blood sugar
And another: intelligent machines. Let’s drill in on that for a bit…
<CLICK>
Many industries are facing massive change. The thing that is driving the change is software and new applications – mobile and web applications – that create new possibilities.
These are just a few examples:
Nest is a software-defined thermostat. Thermostat’s entire job is measure temperature in a range and send a current to turn on and off the furnace / ac when the temperature is out of range. But, Nest built a thermostat with a web application to control it from anywhere and intelligence in the thermostat to recognize patterns and even know when you are home, so it can automatically adjust the temperature for you. That innovation is why Google bought Nest for $3.2B in Case in 2014.
Tesla is a software-defined car. A mobile app allows you to control the car from anywhere, turning on the AC/heat before you arrive, opening and closing doors. They can also improve the car’s capabilities and efficiency by upgrading the car’s software instead of forcing you to get a new car.
Uber allows you to call a towncar from any location. You call the car, the car shows up, you get in, tell them where you are going and get out. Your credit card is automatically hit, then you rate the driver and they rate you. This has turned the taxi industry on it’s ear.
The entertainment industry is another big change. In the 80s, we all went to Blockbuster and hoped our new release was available. In the 2000s, they started redbox, which really hurt Blockbuster. Now you simply sit at home and everyone can watch the new release on the same day it comes out, streaming in to the home. Blockbuster is gone. Redboxes are fading. It’s all online streaming to your TV, your phone, your tablet…
What all products have figured out is its about Outcomes. A client won’t install Hadoop just to buy some servers. They are hoping (realistically or not) that they can improve something of their business
Why would Hadoop want to move towards Enterprise Storage Reference Architectures or Why would Databases w/ Enterprise Storage Reference Architectures move towards Hadoop Vendors
Adoption for Hadoop are based upon available skills and operational support
For SQL Databases, all of the data growth is mainly unstructured
For Hadoop, its hard for companies to get started due to operations and lack of skills with their incumbent talent pool.