SlideShare a Scribd company logo
1 of 35
Big Data and Big Analytics:
                                                             Big Opportunities with
                                                           Hadoop Solutions from EMC



                                                             Featuring EMC Isilon Scale-Out NAS
                                                               Storage and EMC Greenplum HD




                                                                                     Paul S. Levine
                                                                           Senior Systems Engineer
                                                                                    April 9, 2012


© Copyright 2011 EMC Corporation. All rights reserved.                                                1
Today‘s Agenda
 • The Big Data Opportunity
 • Big Data Analytics with Hadoop
 • Technology Challenges of Hadoop
 • EMC‘s Hadoop Solutions for the Enterprise
 • EMC Greenplum‘s Unified Analytics Platform (UAP)
   for Big Data
 • Q+A




© Copyright 2011 EMC Corporation. All rights reserved.   2
The Big Data
                                                 Opportunity




© Copyright 2011 EMC Corporation. All rights reserved.          3
!!!
                                                                                         !!!
―Big Data Is Less
 About Size, And
 More About Freedom‖
                          ―Techcrunch                                                                     !!!
                                                                                                    !!!
                                            !!!
                                                         ―Findings: ‗Big Data‘ Is
                                                          More Extreme Than
                                                          Volume‖                 ―Big Data! It‘s Real, It‘s
                                                                           ― Gartner
                                                                                       Real-time, and It‘s
                                                                                       Already Changing Your
                                                                                       World‖
                                          ―Total data:                                                    ―IDC

                   !!!                     ‗bigger‘ than big
                                           data‖
                                                                              !!!
                                                             ― 451 Group
                                                                                       !!!

© Copyright 2011 EMC Corporation. All rights reserved.                                                           4
!!!
                                                                                             !!!
―Big Data Is Less
 About Size, And
 More About Freedom‖
                          ―Techcrunch
                                                          THE ERA OF                                     !!!
                                                                                                  !!!


                                BIG DATA
                                                         ―Findings: ‗Big Data‘ Is
                                            !!!
                                                          More Extreme Than
                                                          Volume‖                 ―Big Data! It‘s Real, It‘s
                                                                           ― Gartner    Real-time, and It‘s
                                                                                        Already Changing Your
                                                              IS HERE                   World‖
                                          ―Total data:                                                  ―IDC
                                                                           !!!
                   !!!                     ‗bigger‘ than big
                                           data‖                                       !!!
                                                             ― 451 Group




© Copyright 2011 EMC Corporation. All rights reserved.                                                          5
BIG DATA
                                                         IS TRANSFORMING
                                                         BUSINESS


© Copyright 2011 EMC Corporation. All rights reserved.                     6
Big Data in Action
• Healthcare

        – Leverage historical data to discover better
          treatments

• Financial Services

        – Data-driven banking stress tests & risk
          analysis

• Utilities

        – Machine-learning to predict service outages
          & prevent energy theft



 © Copyright 2011 EMC Corporation. All rights reserved.   7
Hadoop & Big Data




© Copyright 2011 EMC Corporation. All rights reserved.   8
The Promise of Big Data Analytics
    Leverage data assets to identify key
    trends and new business opportunities

    Analyze new sources of information to
    gain competitive advantages

    Take an agile approach to analytics that
    can adapt at the speed of business

    Scale your storage and analysis
    platform to handle Big Data‘s volume,
    velocity and variety




© Copyright 2011 EMC Corporation. All rights reserved.   9
The Emergence of Hadoop
• Created 5-6 years ago by former Yahoo!
  Engineer, Doug Cutting
• Software platform designed to analyze
  massive amounts of unstructured data
• Two core components:
         – Hadoop Distributed File System (HDFS) (storage)

         – MapReduce (compute)

• Now a top-level Apache project backed by
  large, open source development community


© Copyright 2011 EMC Corporation. All rights reserved.       10
Why Hadoop is Important
    Pragmatic approach to analytics on a very large scale
         – Opens up new ways of gaining insights and identifying
           opportunities for businesses

    Designed to address the rise of unstructured data
         – Enterprise data to grow by 650% over next 5 years

         – More than 80% of this growth will be unstructured data




© Copyright 2011 EMC Corporation. All rights reserved.              11
Evolution of the Hadoop Market




          Innovators/                                    Early Majority   Late Majority        Laggards
         Early Adopters




                Hadoop Early Adopters                                           Hadoop Early Majority



© Copyright 2011 EMC Corporation. All rights reserved.                                                    12
Evolution of the Hadoop Market
         HADOOP PROFILE (TO DATE)




                         Pioneers and academics
                         Application Architect
                         Visionary

                         Open source / community driven
                         Build-your-own server, application &
                         storage infrastructure
                         Commodity components

                         Web 2.0
                         Universities
                         Life Sciences




                    Hadoop Early Adopters                       Hadoop Early Majority



© Copyright 2011 EMC Corporation. All rights reserved.                                  13
Evolution of the Hadoop Market
     HADOOP PROFILE (TO DATE)                                   HADOOP PROFILE (EMERGING)




                         Pioneers and academics                  IT Manager & CIO
                         Application Architect                   Data Scientist
                         Visionary                               Line-of-business

                         Open source / community driven          Commercial distribution
                         Build-your-own server, application &    Turnkey solution
                         storage infrastructure
                                                                 End-to-End Data protection
                         Commodity components

                         Web 2.0                                 Fortune 1000
                         Universities                            Financial Services
                         Life Sciences                           Retail




                Hadoop Early Adopters                                     Hadoop Early Majority



© Copyright 2011 EMC Corporation. All rights reserved.                                            14
Technology Challenges
                      of Hadoop




© Copyright 2011 EMC Corporation. All rights reserved.   15
Technology Challenges of Hadoop
                Dedicated Storage Infrastructure
                                                           Hadoop DAS Environment
     1               – One-off for Hadoop only                           Name node

                Single Point of Failure
     2               – Namenode

                Lacking Enterprise Data Protection
     3               – No Snapshots, replication, backup

                Poor Storage Efficiency
     4               – 3X mirroring

                Fixed Scalability
     5               – Rigid compute to storage ratio

                Manual Import/Export
     6               – No protocol support




© Copyright 2011 EMC Corporation. All rights reserved.                               16
Technology Challenges of Hadoop
                Dedicated Storage Infrastructure
                                                           Hadoop DAS Environment
     1               – One-off for Hadoop only                           Namenode
                                                                  1x

                Single Point of Failure
     2               – Namenode
                                                                  1x            1x
                Lacking Enterprise Data Protection
     3               – No Snapshots, replication, backup

                                                                  2x            2x
                Poor Storage Efficiency
     4               – 3X mirroring

                Fixed Scalability                                 2x            3x
     5               – Rigid compute to storage ratio

                Manual Import/Export                              3x            3x
     6               – No protocol support




© Copyright 2011 EMC Corporation. All rights reserved.                               17
EMC Addresses the Hadoop Challenge
                Dedicated Storage Infrastructure               Scale-Out Storage Platform
     1               – One-off for Hadoop only
                                                           1     – Multiple applications & workflows

                Single Point of Failure                        No Single Point of Failure
     2               – Namenode
                                                           2     – Distributed Namenode

                Lacking Enterprise Data Protection             End-to-End Data Protection
     3                                                     3     – SnapshotIQ, SyncIQ, NDMP Backup
                     – No Snapshots, replication, backup

                                                               Industry-Leading Storage Efficiency
                Poor Storage Efficiency                    4     – >80% Storage Utilization
     4               – 3X mirroring
                                                               Independent Scalability
                Fixed Scalability                          5     – Add compute & storage separately
     5               – Rigid compute to storage ratio
                                                               Multi-Protocol
                Manual Import/Export                       6     – Industry standard protocols
     6               – No protocol support                       – NFS, CIFS, FTP, HTTP, HDFS




© Copyright 2011 EMC Corporation. All rights reserved.                                                 18
The EMC Isilon Advantage for Hadoop
                                                             Scale-Out Storage Platform
                                                         1     – Multiple applications & workflows

                                                             No Single Point of Failure
                                                         2     – Distributed Namenode

                                                             End-to-End Data Protection
                                                         3     – SnapshotIQ, SyncIQ, NDMP Backup

                                                             Industry-Leading Storage Efficiency
                                                         4     – >80% Storage Utilization

                                                             Independent Scalability
                                                         5     – Add compute & storage separately

                                                             Multi-Protocol
                                                         6     – Industry standard protocols
                                                               – NFS, CIFS, FTP, HTTP, HDFS




© Copyright 2011 EMC Corporation. All rights reserved.                                               19
Industry’s First and Only Scale-Out Storage Solution
with Native Hadoop Integration


                           Accelerating the Benefits
                           of Hadoop for the
                           Enterprise

                           Reducing Risk

                           End-to-End Data Protection

                           Organizational
                           Knowledge/Experience
Core Innovation…Value to Customers
Isilon’s OneFS Scale-Out Operating System




                               Creates one giant network drive
                               Single file system, single volume
                               Guaranteed 80% raw storage utilization
                               Highest performance, fully symmetric cluster
                               Easy to manage and grow
                               Auto Balanced, Self Healing
                               Global Namespace
                               Multi-tier single file system/single cluster        2
                                                                                    1
NO More Management of LUNs, Volumes or RAID
© Copyright 2011 EMC Corporation. All rights reserved.                         21
Isilon‘s Clustered Storage Solution
                                                   ENTERPRISE CLASS HARDWARE & SOFTWARE

  FREE BSD OPERATING SYSTEM
           SOFTWARE

                              File System




A 3-node                                                                  40 Gigabit Infiniband
Isilon IQ Cluster


            Expandable to 15.5 PB in a single file system (144 nodes)
                 Sequential I/O performance= >85 GB/sec
             Specsfs_2008 I/O operations/sec = 1.6 Million IOPs

© Copyright 2011 EMC Corporation. All rights reserved.                                        22
Isilon IQ Network Architecture
                Windows

                                                                                                    40 Gb
                                                 NFS, CIFS, iSCSI
                                                   FTP, HTTP                                      Infiniband
                UNIX/LINUX




                                               (optional switches for
                                                additional subnets)                             (optional 2nd
                                                1 or 10 GigE                                     switch for
                                                                                                    high
         MAC                                                                                    availability)




                                                                                              Intracluster
 Client/Application                           Standard                       Isilon IQ       Communication
       Layer                                   Gigabit                  Clustered Storage   InfiniBand Layer
                                           Ethernet Layer                      Layer
          Industry standard protocols
           NFS v3, v4, SMB, SMB2 (Native), iSCSI, HTTP, HDFS (Hadoop),
           NDMP, SNMP, ADS, LDAP and NIS for security


© Copyright 2011 EMC Corporation. All rights reserved.                                                          23
The Most Reliable Storage System
   Built-in high availability clustered architecture
             Traditional storage requires costly, redundant heads and software



With N+2, N+3, and
     N+2:1
N+4 protection,
protection,
                                                                  100%
                                                                             40 Gb
data is 100% available
data is 100% available                                            100%
                                                                           Infiniband
if multiple drives or or
even if a single drive                                   FAILED   100%
nodesfails
node fail
                                                                  100%


                                                                  100%
And… Isilon IQ offers the                                                (optional 2nd
                                                                  100%
industry‘s fastest drive                                                  switch for
                                                                             high
rebuild times —                                                   100%   availability)

In less than an hour!                                    FAILED   100%




      Protection can be set at the cluster, directory, or file
      level
© Copyright 2011 EMC Corporation. All rights reserved.                                   24
Largest and Most Scalable Storage System
   OneFS™ can scale from 18TB to over
   15,000 TB in a single file system
             •
             •
             •
             •




© Copyright 2011 EMC Corporation. All rights reserved.   25
Linear, Predictable Performance = SLAs
    AutoBalance: Automated data balancing across
    nodes
             Reduces costs, complexity and risks for scaling storage



                                                         BALANCED
                                           AutoBalance migrates


                                                           EMPTY
                                           content to new storage nodes
                                           while system is online and
                                           in production
                                                         BALANCED
                                                           EMPTY
                                                           FULL


                                                                            Requires NO
                                                         BALANCED




                                                                                Manual intervention
                                                           EMPTY
                                                           FULL




                                                                                Reconfiguration
                                                                                server or client mount point
                                                         BALANCED
                                                           EMPTY




                                                                                or application changes
                                                           FULL




                                                                    Under 15 seconds to scale with no
                                                                    downtime
                                                         BALANCED
                                                           EMPTY




                                                                    World’s fastest performance and capacity
                                                           FULL




                                                                     scaling


© Copyright 2011 EMC Corporation. All rights reserved.                                                         26
Solutions
                                                                                                           Siz
                                   Nodes                  Proc     Memory        Disk       Capacity
                                                                                                            e

                                                          (2x 4                SSD/SAS
                                     S200                          24 - 96GB                 7-14 TB       2U
                                                          core)                 24 Slots

                                                                               SSD/SAT
                                                          (1x 4
                                     X200                          6 - 48GB       A          6-36 TB       2U
                                                          core)
                                                                               12 Slots
                                                         Westmer
                                                                     24 -      SSD/SAT
                                                            e                              36, 72, 108
                                     X400                           192GB         A                        4U
                                                          (2x 6                                TB
                                                                               36 Slots
                                                          core)
                                                         Westmer
                                                            e                   SATA       36, 72, 108
                                   NL400                           12 - 48GB                               4U
                                                          (2x 4                36 Slots        TB
                                                          core)
                                Backup
                                                          (2x 4
                               Accelerato                           32 GB      Diskless    4 Fiber Ports   1U
                                                          core)
                                   r
© Copyright 2011 EMC Corporation. All rights reserved.                                                           27
Full Suite of Enterprise Software Options
                                            • Combine multiple storage tiers into a single file system



                                            • Simple, scalable and flexible data protection



                                            • Policy-based client load balancing with NFS failover



                                            • Quota management and thin provisioning



                                            • Fast and flexible file-base asynchronous replication



                                            • Analytics platform to maximize performance and resource utilization



                                            • WORM functionality enforces file-level retention




© Copyright 2011 EMC Corporation. All rights reserved.                                                              28
EMC‘s Enterprise Hadoop Solution
EMC Greenplum HD and EMC Isilon Scale-Out Storage


                                                         Apache Hadoop certified by
                                                         Greenplum
           Compute




                                                         Simple platform management and
                                                         control
                                                         Parallel analytics access with
                                                         Greenplum Database
           Storage




© Copyright 2011 EMC Corporation. All rights reserved.                                    29
Flexible Packaging
                                  Hadoop Software + Storage
      Package Greenplum HD software on
      commodity x86 hardware
      Isilon scale-out NAS



                                           Hadoop Appliance + Storage

      Greenplum HD Data Computing
      Appliance
      Isilon scale-out NAS




© Copyright 2011 EMC Corporation. All rights reserved.                  30
Greenplum HD Data Computing Appliance
Software Architecture with Isilon
   Greenplum
                                                                     Greenplum Chorus


                                       Command Center
                                         Greenplum
                                                         Hadoop Tools (Pig, Hive, HBase, Mahout, etc…)



                                                                      MapReduce Layer


                                                             Pluggable Storage Layer (HDFS API)



                                                                     HDFS Protocol

          Isilon                                                     lsilon OneFS




© Copyright 2011 EMC Corporation. All rights reserved.                                                   31
Innovative Companies Using Greenplum




© Copyright 2011 EMC Corporation. All rights reserved.   32
Powerful Partner Ecosystem




                                                         Discovix




© Copyright 2011 EMC Corporation. All rights reserved.              33
Greenplum: Not Just About Technology
                                                         • Data Science teams will become the
                                                           driving force for success with big data
                                                           analytics
                                                         • Greenplum is committed to the future
                                                           of data science
                                                            – University data science program collaboration
                                                              with Stanford and UC Berkeley
                                                            – Community investment including the
                                                              Greenplum Analytic Workbench, Community
                                                              edition software, and Data Science Summits

                                                         • Greenplum built its own Data Science
                                                           practice
                                                            – Leading PhDs with analytic tools expertise



© Copyright 2011 EMC Corporation. All rights reserved.                                                        34
Questions?




© Copyright 2011 EMC Corporation. All rights reserved.                35

More Related Content

What's hot

The Diverse and Exploding Digital Universe
The Diverse and Exploding Digital UniverseThe Diverse and Exploding Digital Universe
The Diverse and Exploding Digital Universearms8586
 
The Digital Universe in 2020
The Digital Universe in 2020The Digital Universe in 2020
The Digital Universe in 2020arms8586
 
PCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentationPCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentationAnn Treacy
 
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataDr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataGlobal Business Events
 
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMIBM Danmark
 
Networked life...Network Enterprise
Networked life...Network EnterpriseNetworked life...Network Enterprise
Networked life...Network EnterpriseFondazione CUOA
 
Building a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in CanadaBuilding a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in CanadaTechAlliance of Southwestern Ontario
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...IABmembership
 
Big data and the challenge of extreme information
Big data and the challenge of extreme informationBig data and the challenge of extreme information
Big data and the challenge of extreme informationJohn Mancini
 
Cloud computing jason lannen_4-28-10
Cloud computing jason lannen_4-28-10Cloud computing jason lannen_4-28-10
Cloud computing jason lannen_4-28-10Ngy Ea
 
Experimental Input and Output
Experimental Input and OutputExperimental Input and Output
Experimental Input and OutputDavid Lamas
 
Information Management on Mobile Steroids
Information Management on Mobile SteroidsInformation Management on Mobile Steroids
Information Management on Mobile SteroidsJohn Mancini
 
A next generation introduction to data science and its potential to change bu...
A next generation introduction to data science and its potential to change bu...A next generation introduction to data science and its potential to change bu...
A next generation introduction to data science and its potential to change bu...InnoTech
 

What's hot (17)

The Diverse and Exploding Digital Universe
The Diverse and Exploding Digital UniverseThe Diverse and Exploding Digital Universe
The Diverse and Exploding Digital Universe
 
The Digital Universe in 2020
The Digital Universe in 2020The Digital Universe in 2020
The Digital Universe in 2020
 
PCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentationPCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentation
 
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataDr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
 
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
 
Networked life...Network Enterprise
Networked life...Network EnterpriseNetworked life...Network Enterprise
Networked life...Network Enterprise
 
Building a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in CanadaBuilding a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in Canada
 
L18 Big Data and Analytics
L18 Big Data and AnalyticsL18 Big Data and Analytics
L18 Big Data and Analytics
 
Internet_of_Things_CEO_ magazine
Internet_of_Things_CEO_ magazineInternet_of_Things_CEO_ magazine
Internet_of_Things_CEO_ magazine
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
 
Big data and the challenge of extreme information
Big data and the challenge of extreme informationBig data and the challenge of extreme information
Big data and the challenge of extreme information
 
Cloud computing jason lannen_4-28-10
Cloud computing jason lannen_4-28-10Cloud computing jason lannen_4-28-10
Cloud computing jason lannen_4-28-10
 
16h30 p duff-big-data-final
16h30   p duff-big-data-final16h30   p duff-big-data-final
16h30 p duff-big-data-final
 
Experimental Input and Output
Experimental Input and OutputExperimental Input and Output
Experimental Input and Output
 
Information Management on Mobile Steroids
Information Management on Mobile SteroidsInformation Management on Mobile Steroids
Information Management on Mobile Steroids
 
A next generation introduction to data science and its potential to change bu...
A next generation introduction to data science and its potential to change bu...A next generation introduction to data science and its potential to change bu...
A next generation introduction to data science and its potential to change bu...
 
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
 

Viewers also liked

Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusterst_ivanov
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonRSD
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoopTaldor Group
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?NUS-ISS
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Gartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesGartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesPhilip Say
 
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveVMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveChris Wahl
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBoston Consulting Group
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 

Viewers also liked (20)

Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec Isilon
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoop
 
Jsm big-data
Jsm big-dataJsm big-data
Jsm big-data
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter Kit
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Gartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesGartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud Services
 
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveVMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
Cloud Management with vRealize Operations
Cloud Management with vRealize OperationsCloud Management with vRealize Operations
Cloud Management with vRealize Operations
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 

Similar to Emerging Big Data & Analytics Trends with Hadoop

Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...EMC
 
Information Management in the Age of Big Data
Information Management in the Age of Big DataInformation Management in the Age of Big Data
Information Management in the Age of Big Databigdatasyd
 
Beyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global CommunicationBeyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global CommunicationJerry Fishenden
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeFinding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeEmPower Research, a Genpact company
 
EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...
EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...
EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...European Data Forum
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietySURFnet
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Cutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellCutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellAMD
 
The Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand ITThe Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand ITInnoTech
 
Understanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalUnderstanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalAndrew Gregoris
 
Track 3, session 3,big data infrastructure by sunil brid
Track 3, session 3,big data infrastructure by sunil bridTrack 3, session 3,big data infrastructure by sunil brid
Track 3, session 3,big data infrastructure by sunil bridEMC Forum India
 
What every executive needs to know about IT
What every executive needs to know about ITWhat every executive needs to know about IT
What every executive needs to know about ITScott Studham
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentStrategy 2 Market, Inc,
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
26 a6 emc europe - arnaud christoffel
26 a6   emc europe - arnaud christoffel26 a6   emc europe - arnaud christoffel
26 a6 emc europe - arnaud christoffelScott Adams
 

Similar to Emerging Big Data & Analytics Trends with Hadoop (20)

Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
 
101 ab 1415-1445
101 ab 1415-1445101 ab 1415-1445
101 ab 1415-1445
 
Information Management in the Age of Big Data
Information Management in the Age of Big DataInformation Management in the Age of Big Data
Information Management in the Age of Big Data
 
Beyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global CommunicationBeyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global Communication
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeFinding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
 
EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...
EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...
EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Bus...
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Cutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellCutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and Dell
 
The Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand ITThe Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand IT
 
Understanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalUnderstanding The Big Data Opportunity Final
Understanding The Big Data Opportunity Final
 
Track 3, session 3,big data infrastructure by sunil brid
Track 3, session 3,big data infrastructure by sunil bridTrack 3, session 3,big data infrastructure by sunil brid
Track 3, session 3,big data infrastructure by sunil brid
 
What every executive needs to know about IT
What every executive needs to know about ITWhat every executive needs to know about IT
What every executive needs to know about IT
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
26 a6 emc europe - arnaud christoffel
26 a6   emc europe - arnaud christoffel26 a6   emc europe - arnaud christoffel
26 a6 emc europe - arnaud christoffel
 

More from InnoTech

"So you want to raise funding and build a team?"
"So you want to raise funding and build a team?""So you want to raise funding and build a team?"
"So you want to raise funding and build a team?"InnoTech
 
Artificial Intelligence is Maturing
Artificial Intelligence is MaturingArtificial Intelligence is Maturing
Artificial Intelligence is MaturingInnoTech
 
What is AI without Data?
What is AI without Data?What is AI without Data?
What is AI without Data?InnoTech
 
Courageous Leadership - When it Matters Most
Courageous Leadership - When it Matters MostCourageous Leadership - When it Matters Most
Courageous Leadership - When it Matters MostInnoTech
 
The Gathering Storm
The Gathering StormThe Gathering Storm
The Gathering StormInnoTech
 
Sql Server tips from the field
Sql Server tips from the fieldSql Server tips from the field
Sql Server tips from the fieldInnoTech
 
Quantum Computing and its security implications
Quantum Computing and its security implicationsQuantum Computing and its security implications
Quantum Computing and its security implicationsInnoTech
 
Converged Infrastructure
Converged InfrastructureConverged Infrastructure
Converged InfrastructureInnoTech
 
Making the most out of collaboration with Office 365
Making the most out of collaboration with Office 365Making the most out of collaboration with Office 365
Making the most out of collaboration with Office 365InnoTech
 
Blockchain use cases and case studies
Blockchain use cases and case studiesBlockchain use cases and case studies
Blockchain use cases and case studiesInnoTech
 
Blockchain: Exploring the Fundamentals and Promising Potential
Blockchain: Exploring the Fundamentals and Promising Potential Blockchain: Exploring the Fundamentals and Promising Potential
Blockchain: Exploring the Fundamentals and Promising Potential InnoTech
 
Business leaders are engaging labor differently - Is your IT ready?
Business leaders are engaging labor differently - Is your IT ready?Business leaders are engaging labor differently - Is your IT ready?
Business leaders are engaging labor differently - Is your IT ready?InnoTech
 
AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...
AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...
AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...InnoTech
 
Using Business Intelligence to Bring Your Data to Life
Using Business Intelligence to Bring Your Data to LifeUsing Business Intelligence to Bring Your Data to Life
Using Business Intelligence to Bring Your Data to LifeInnoTech
 
User requirements is a fallacy
User requirements is a fallacyUser requirements is a fallacy
User requirements is a fallacyInnoTech
 
What I Wish I Knew Before I Signed that Contract - San Antonio
What I Wish I Knew Before I Signed that Contract - San Antonio What I Wish I Knew Before I Signed that Contract - San Antonio
What I Wish I Knew Before I Signed that Contract - San Antonio InnoTech
 
Disaster Recovery Plan - Quorum
Disaster Recovery Plan - QuorumDisaster Recovery Plan - Quorum
Disaster Recovery Plan - QuorumInnoTech
 
Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2InnoTech
 
Sp tech festdallas - office 365 groups - planner session
Sp tech festdallas - office 365 groups - planner sessionSp tech festdallas - office 365 groups - planner session
Sp tech festdallas - office 365 groups - planner sessionInnoTech
 
Power apps presentation
Power apps presentationPower apps presentation
Power apps presentationInnoTech
 

More from InnoTech (20)

"So you want to raise funding and build a team?"
"So you want to raise funding and build a team?""So you want to raise funding and build a team?"
"So you want to raise funding and build a team?"
 
Artificial Intelligence is Maturing
Artificial Intelligence is MaturingArtificial Intelligence is Maturing
Artificial Intelligence is Maturing
 
What is AI without Data?
What is AI without Data?What is AI without Data?
What is AI without Data?
 
Courageous Leadership - When it Matters Most
Courageous Leadership - When it Matters MostCourageous Leadership - When it Matters Most
Courageous Leadership - When it Matters Most
 
The Gathering Storm
The Gathering StormThe Gathering Storm
The Gathering Storm
 
Sql Server tips from the field
Sql Server tips from the fieldSql Server tips from the field
Sql Server tips from the field
 
Quantum Computing and its security implications
Quantum Computing and its security implicationsQuantum Computing and its security implications
Quantum Computing and its security implications
 
Converged Infrastructure
Converged InfrastructureConverged Infrastructure
Converged Infrastructure
 
Making the most out of collaboration with Office 365
Making the most out of collaboration with Office 365Making the most out of collaboration with Office 365
Making the most out of collaboration with Office 365
 
Blockchain use cases and case studies
Blockchain use cases and case studiesBlockchain use cases and case studies
Blockchain use cases and case studies
 
Blockchain: Exploring the Fundamentals and Promising Potential
Blockchain: Exploring the Fundamentals and Promising Potential Blockchain: Exploring the Fundamentals and Promising Potential
Blockchain: Exploring the Fundamentals and Promising Potential
 
Business leaders are engaging labor differently - Is your IT ready?
Business leaders are engaging labor differently - Is your IT ready?Business leaders are engaging labor differently - Is your IT ready?
Business leaders are engaging labor differently - Is your IT ready?
 
AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...
AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...
AI 3.0: Is it Finally Time for Artificial Intelligence and Sensor Networks to...
 
Using Business Intelligence to Bring Your Data to Life
Using Business Intelligence to Bring Your Data to LifeUsing Business Intelligence to Bring Your Data to Life
Using Business Intelligence to Bring Your Data to Life
 
User requirements is a fallacy
User requirements is a fallacyUser requirements is a fallacy
User requirements is a fallacy
 
What I Wish I Knew Before I Signed that Contract - San Antonio
What I Wish I Knew Before I Signed that Contract - San Antonio What I Wish I Knew Before I Signed that Contract - San Antonio
What I Wish I Knew Before I Signed that Contract - San Antonio
 
Disaster Recovery Plan - Quorum
Disaster Recovery Plan - QuorumDisaster Recovery Plan - Quorum
Disaster Recovery Plan - Quorum
 
Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2
 
Sp tech festdallas - office 365 groups - planner session
Sp tech festdallas - office 365 groups - planner sessionSp tech festdallas - office 365 groups - planner session
Sp tech festdallas - office 365 groups - planner session
 
Power apps presentation
Power apps presentationPower apps presentation
Power apps presentation
 

Recently uploaded

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Emerging Big Data & Analytics Trends with Hadoop

  • 1. Big Data and Big Analytics: Big Opportunities with Hadoop Solutions from EMC Featuring EMC Isilon Scale-Out NAS Storage and EMC Greenplum HD Paul S. Levine Senior Systems Engineer April 9, 2012 © Copyright 2011 EMC Corporation. All rights reserved. 1
  • 2. Today‘s Agenda • The Big Data Opportunity • Big Data Analytics with Hadoop • Technology Challenges of Hadoop • EMC‘s Hadoop Solutions for the Enterprise • EMC Greenplum‘s Unified Analytics Platform (UAP) for Big Data • Q+A © Copyright 2011 EMC Corporation. All rights reserved. 2
  • 3. The Big Data Opportunity © Copyright 2011 EMC Corporation. All rights reserved. 3
  • 4. !!! !!! ―Big Data Is Less About Size, And More About Freedom‖ ―Techcrunch !!! !!! !!! ―Findings: ‗Big Data‘ Is More Extreme Than Volume‖ ―Big Data! It‘s Real, It‘s ― Gartner Real-time, and It‘s Already Changing Your World‖ ―Total data: ―IDC !!! ‗bigger‘ than big data‖ !!! ― 451 Group !!! © Copyright 2011 EMC Corporation. All rights reserved. 4
  • 5. !!! !!! ―Big Data Is Less About Size, And More About Freedom‖ ―Techcrunch THE ERA OF !!! !!! BIG DATA ―Findings: ‗Big Data‘ Is !!! More Extreme Than Volume‖ ―Big Data! It‘s Real, It‘s ― Gartner Real-time, and It‘s Already Changing Your IS HERE World‖ ―Total data: ―IDC !!! !!! ‗bigger‘ than big data‖ !!! ― 451 Group © Copyright 2011 EMC Corporation. All rights reserved. 5
  • 6. BIG DATA IS TRANSFORMING BUSINESS © Copyright 2011 EMC Corporation. All rights reserved. 6
  • 7. Big Data in Action • Healthcare – Leverage historical data to discover better treatments • Financial Services – Data-driven banking stress tests & risk analysis • Utilities – Machine-learning to predict service outages & prevent energy theft © Copyright 2011 EMC Corporation. All rights reserved. 7
  • 8. Hadoop & Big Data © Copyright 2011 EMC Corporation. All rights reserved. 8
  • 9. The Promise of Big Data Analytics Leverage data assets to identify key trends and new business opportunities Analyze new sources of information to gain competitive advantages Take an agile approach to analytics that can adapt at the speed of business Scale your storage and analysis platform to handle Big Data‘s volume, velocity and variety © Copyright 2011 EMC Corporation. All rights reserved. 9
  • 10. The Emergence of Hadoop • Created 5-6 years ago by former Yahoo! Engineer, Doug Cutting • Software platform designed to analyze massive amounts of unstructured data • Two core components: – Hadoop Distributed File System (HDFS) (storage) – MapReduce (compute) • Now a top-level Apache project backed by large, open source development community © Copyright 2011 EMC Corporation. All rights reserved. 10
  • 11. Why Hadoop is Important Pragmatic approach to analytics on a very large scale – Opens up new ways of gaining insights and identifying opportunities for businesses Designed to address the rise of unstructured data – Enterprise data to grow by 650% over next 5 years – More than 80% of this growth will be unstructured data © Copyright 2011 EMC Corporation. All rights reserved. 11
  • 12. Evolution of the Hadoop Market Innovators/ Early Majority Late Majority Laggards Early Adopters Hadoop Early Adopters Hadoop Early Majority © Copyright 2011 EMC Corporation. All rights reserved. 12
  • 13. Evolution of the Hadoop Market HADOOP PROFILE (TO DATE) Pioneers and academics Application Architect Visionary Open source / community driven Build-your-own server, application & storage infrastructure Commodity components Web 2.0 Universities Life Sciences Hadoop Early Adopters Hadoop Early Majority © Copyright 2011 EMC Corporation. All rights reserved. 13
  • 14. Evolution of the Hadoop Market HADOOP PROFILE (TO DATE) HADOOP PROFILE (EMERGING) Pioneers and academics IT Manager & CIO Application Architect Data Scientist Visionary Line-of-business Open source / community driven Commercial distribution Build-your-own server, application & Turnkey solution storage infrastructure End-to-End Data protection Commodity components Web 2.0 Fortune 1000 Universities Financial Services Life Sciences Retail Hadoop Early Adopters Hadoop Early Majority © Copyright 2011 EMC Corporation. All rights reserved. 14
  • 15. Technology Challenges of Hadoop © Copyright 2011 EMC Corporation. All rights reserved. 15
  • 16. Technology Challenges of Hadoop Dedicated Storage Infrastructure Hadoop DAS Environment 1 – One-off for Hadoop only Name node Single Point of Failure 2 – Namenode Lacking Enterprise Data Protection 3 – No Snapshots, replication, backup Poor Storage Efficiency 4 – 3X mirroring Fixed Scalability 5 – Rigid compute to storage ratio Manual Import/Export 6 – No protocol support © Copyright 2011 EMC Corporation. All rights reserved. 16
  • 17. Technology Challenges of Hadoop Dedicated Storage Infrastructure Hadoop DAS Environment 1 – One-off for Hadoop only Namenode 1x Single Point of Failure 2 – Namenode 1x 1x Lacking Enterprise Data Protection 3 – No Snapshots, replication, backup 2x 2x Poor Storage Efficiency 4 – 3X mirroring Fixed Scalability 2x 3x 5 – Rigid compute to storage ratio Manual Import/Export 3x 3x 6 – No protocol support © Copyright 2011 EMC Corporation. All rights reserved. 17
  • 18. EMC Addresses the Hadoop Challenge Dedicated Storage Infrastructure Scale-Out Storage Platform 1 – One-off for Hadoop only 1 – Multiple applications & workflows Single Point of Failure No Single Point of Failure 2 – Namenode 2 – Distributed Namenode Lacking Enterprise Data Protection End-to-End Data Protection 3 3 – SnapshotIQ, SyncIQ, NDMP Backup – No Snapshots, replication, backup Industry-Leading Storage Efficiency Poor Storage Efficiency 4 – >80% Storage Utilization 4 – 3X mirroring Independent Scalability Fixed Scalability 5 – Add compute & storage separately 5 – Rigid compute to storage ratio Multi-Protocol Manual Import/Export 6 – Industry standard protocols 6 – No protocol support – NFS, CIFS, FTP, HTTP, HDFS © Copyright 2011 EMC Corporation. All rights reserved. 18
  • 19. The EMC Isilon Advantage for Hadoop Scale-Out Storage Platform 1 – Multiple applications & workflows No Single Point of Failure 2 – Distributed Namenode End-to-End Data Protection 3 – SnapshotIQ, SyncIQ, NDMP Backup Industry-Leading Storage Efficiency 4 – >80% Storage Utilization Independent Scalability 5 – Add compute & storage separately Multi-Protocol 6 – Industry standard protocols – NFS, CIFS, FTP, HTTP, HDFS © Copyright 2011 EMC Corporation. All rights reserved. 19
  • 20. Industry’s First and Only Scale-Out Storage Solution with Native Hadoop Integration Accelerating the Benefits of Hadoop for the Enterprise Reducing Risk End-to-End Data Protection Organizational Knowledge/Experience
  • 21. Core Innovation…Value to Customers Isilon’s OneFS Scale-Out Operating System  Creates one giant network drive  Single file system, single volume  Guaranteed 80% raw storage utilization  Highest performance, fully symmetric cluster  Easy to manage and grow  Auto Balanced, Self Healing  Global Namespace  Multi-tier single file system/single cluster 2 1 NO More Management of LUNs, Volumes or RAID © Copyright 2011 EMC Corporation. All rights reserved. 21
  • 22. Isilon‘s Clustered Storage Solution ENTERPRISE CLASS HARDWARE & SOFTWARE FREE BSD OPERATING SYSTEM SOFTWARE File System A 3-node 40 Gigabit Infiniband Isilon IQ Cluster Expandable to 15.5 PB in a single file system (144 nodes) Sequential I/O performance= >85 GB/sec Specsfs_2008 I/O operations/sec = 1.6 Million IOPs © Copyright 2011 EMC Corporation. All rights reserved. 22
  • 23. Isilon IQ Network Architecture Windows 40 Gb NFS, CIFS, iSCSI FTP, HTTP Infiniband UNIX/LINUX (optional switches for additional subnets) (optional 2nd 1 or 10 GigE switch for high MAC availability) Intracluster Client/Application Standard Isilon IQ Communication Layer Gigabit Clustered Storage InfiniBand Layer Ethernet Layer Layer Industry standard protocols NFS v3, v4, SMB, SMB2 (Native), iSCSI, HTTP, HDFS (Hadoop), NDMP, SNMP, ADS, LDAP and NIS for security © Copyright 2011 EMC Corporation. All rights reserved. 23
  • 24. The Most Reliable Storage System Built-in high availability clustered architecture Traditional storage requires costly, redundant heads and software With N+2, N+3, and N+2:1 N+4 protection, protection, 100% 40 Gb data is 100% available data is 100% available 100% Infiniband if multiple drives or or even if a single drive FAILED 100% nodesfails node fail 100% 100% And… Isilon IQ offers the (optional 2nd 100% industry‘s fastest drive switch for high rebuild times — 100% availability) In less than an hour! FAILED 100% Protection can be set at the cluster, directory, or file level © Copyright 2011 EMC Corporation. All rights reserved. 24
  • 25. Largest and Most Scalable Storage System OneFS™ can scale from 18TB to over 15,000 TB in a single file system • • • • © Copyright 2011 EMC Corporation. All rights reserved. 25
  • 26. Linear, Predictable Performance = SLAs AutoBalance: Automated data balancing across nodes Reduces costs, complexity and risks for scaling storage BALANCED AutoBalance migrates EMPTY content to new storage nodes while system is online and in production BALANCED EMPTY FULL Requires NO BALANCED Manual intervention EMPTY FULL Reconfiguration server or client mount point BALANCED EMPTY or application changes FULL Under 15 seconds to scale with no downtime BALANCED EMPTY World’s fastest performance and capacity FULL scaling © Copyright 2011 EMC Corporation. All rights reserved. 26
  • 27. Solutions Siz Nodes Proc Memory Disk Capacity e (2x 4 SSD/SAS S200 24 - 96GB 7-14 TB 2U core) 24 Slots SSD/SAT (1x 4 X200 6 - 48GB A 6-36 TB 2U core) 12 Slots Westmer 24 - SSD/SAT e 36, 72, 108 X400 192GB A 4U (2x 6 TB 36 Slots core) Westmer e SATA 36, 72, 108 NL400 12 - 48GB 4U (2x 4 36 Slots TB core) Backup (2x 4 Accelerato 32 GB Diskless 4 Fiber Ports 1U core) r © Copyright 2011 EMC Corporation. All rights reserved. 27
  • 28. Full Suite of Enterprise Software Options • Combine multiple storage tiers into a single file system • Simple, scalable and flexible data protection • Policy-based client load balancing with NFS failover • Quota management and thin provisioning • Fast and flexible file-base asynchronous replication • Analytics platform to maximize performance and resource utilization • WORM functionality enforces file-level retention © Copyright 2011 EMC Corporation. All rights reserved. 28
  • 29. EMC‘s Enterprise Hadoop Solution EMC Greenplum HD and EMC Isilon Scale-Out Storage Apache Hadoop certified by Greenplum Compute Simple platform management and control Parallel analytics access with Greenplum Database Storage © Copyright 2011 EMC Corporation. All rights reserved. 29
  • 30. Flexible Packaging Hadoop Software + Storage Package Greenplum HD software on commodity x86 hardware Isilon scale-out NAS Hadoop Appliance + Storage Greenplum HD Data Computing Appliance Isilon scale-out NAS © Copyright 2011 EMC Corporation. All rights reserved. 30
  • 31. Greenplum HD Data Computing Appliance Software Architecture with Isilon Greenplum Greenplum Chorus Command Center Greenplum Hadoop Tools (Pig, Hive, HBase, Mahout, etc…) MapReduce Layer Pluggable Storage Layer (HDFS API) HDFS Protocol Isilon lsilon OneFS © Copyright 2011 EMC Corporation. All rights reserved. 31
  • 32. Innovative Companies Using Greenplum © Copyright 2011 EMC Corporation. All rights reserved. 32
  • 33. Powerful Partner Ecosystem Discovix © Copyright 2011 EMC Corporation. All rights reserved. 33
  • 34. Greenplum: Not Just About Technology • Data Science teams will become the driving force for success with big data analytics • Greenplum is committed to the future of data science – University data science program collaboration with Stanford and UC Berkeley – Community investment including the Greenplum Analytic Workbench, Community edition software, and Data Science Summits • Greenplum built its own Data Science practice – Leading PhDs with analytic tools expertise © Copyright 2011 EMC Corporation. All rights reserved. 34
  • 35. Questions? © Copyright 2011 EMC Corporation. All rights reserved. 35

Editor's Notes

  1. Here’s what we’re going to cover in today’s session:Walk through agenda
  2. To start things off today, let’s look at “The Big Data Opportunity”
  3. <This slide gives you the opportunity to tell the audience that we are in the era of big data.>I’m sure you’ve seen some of the articles in the press about “Big Data”. It seems as if everyone is talking about it. Some of you are probably living it today. There’s lots of interest in it but many aren’t exactly sure about what they should be doing about it. Big Data has been recognized world over for the potential impact it can have. Gartner has said that enterprise’s who embrace Big Data will outperform their peers financially by 20%.<click>
  4. Make no mistake about it The Era of Big Data is Here
  5. Over the next decade, the explosion of data will introduce not only massive challenges for IT, but massive opportunities for business. In fact, we’ve seen a number of our customers use Big Data to transform their businessLet’s look a just a few examples
  6. Healthcare: Hospitals are implementing EMR (electronic medical records) and enabling access to larger volumes of historical patient data. Big Data Analytics infrastructures enable doctors and hospitals to leverage this EMR data to find patterns in the success of various treatments for patients with a variety of characteristics. Through the ability to store and analyze massive volumes of patient data, doctors are discovering more effective treatment options targeted at the specific characteristics of their patientsFinancial services: Banks and investment institutions have always been focused on the use of data in all of their operations Now “Big Data” brings the ability to run predictive analytics enabling these organizations to determine how their balance sheets can be affected by a variety of different market forces. For example, if the Euro drops 20%, how will that affect the bank’s balance and ability to borrow or lend money.Utilities: The implementation of Advanced Metering Infrastructure is generating massive amounts of data on the distribution and consumption of energy by commercial institutions and businesses. Utility companies can leverage these new forms of data to predict service failures and more quickly detect energy theft.
  7. Now let’s look at ”Hadoop” and its role on Big Data Analytics.
  8. To harness the full power of Big Data assets, “Big Data Analytics” are increasingly importantWith “Big Data Analytics”, organizations can leverage their “Big Data” assets to uncover new, emerging trends and identify potential business opportunities.With these powerful tools, businesses can tap into their Big Data assets and potentially discover new ways to gain competitive advantages.In sum, these technologies help organizations become more agile and identify opportunities and respond fasterRecent technology trends including the growth of the Internet have generated an immense and growing wave of “Big Data” that will require your “Big Data” storage and analytics platforms to scale significantly to handle the volume, velocity and variety of this data.To under score this, IDC recently projected that the amount of data managed by enterprises today will increase by 50x by 2020. In addition, 80% or more of this data will be “unstructured” , file-based data. With this as the backdrop, let’s look at the emergence of Hadoop.
  9. Hadoop was developed 5-6 years ago to specifically address the need for “Big Data Analytics” At the time, development for Hadoop was being driven by the big Internet companies like Yahoo! And Google who were amassing a huge amount of unstructured data and needed a new way to analyze it because traditional approaches couldn’t handle this new “Big Data” challenge.The development of Hadoop was pioneered by Doug Cutting, a former Yahoo! EngineerHadoop consists of 2 key elements: The “Hadoop Distributed File System” (HDFS) while handles the storage component of the systemMapReduce which handles the “compute” functionToday,Hadoop is an ‘open-source’ initiative, very similar to Linux, and backed by a large, open source development community who collaborate on “Apache Hadoop”As with Linux, there are a number of approved or authorized Apache Hadoop distributions, including EMC Greenplum’s “Greenplum HD”. <You may also want to note that “Hadoop” got it’s name from Doug Cutting’s son’s toy elephant. This also explains, the “elephant” that is often depicted on materials relating to Apache Hadoop.>Now let’s look at why hadoop is so important.
  10. One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  11. <This slide will automatically build to the next slide>
  12. The initial, early adopters of Hadoop were largely the big Internet companies as well as a number of universities and research organizations.These early adopters were very “techy” and research-oriented. Typically, Hadoop was deployed in a “lab” environment, outside the domain of any traditional enterprise IT department. Often, these early deployments were very much a “do-it-yourself” effort involving the assembly of systems using commodity components.It wasn’t unusual, especially in academic environments, for a small-army of research assistants to be used to keep the system running.<advance to next slide>
  13. Now, flash forward 5-6 years and we are seeing Hadoop beginning to go mainstream in enterprise environments across a wide range of industries.Increasingly, IT executives and line-of-business managers looking to leverage the “Big Data” assets within their organization to identify new opportunities and accelerate their business.Related to this, we are seeing the emergence of a new role in organizations: Data ScientistsThese organizations are also keenly interested in integrating Hadoop and its infrastructure into their overall IT environment so that they can protect the data and manage it with their standard IT processes. They are also more interested in acquiring and deploying ‘proven’ Hadoop solutions rather than building a “do-it-yourself” projectWhile Hadoop offers great potential value to organizations, it is not without certain challenges that need to be addressed. Let’s look as these.
  14. It this section, we’re going to identify and describe the key technology challenges of Hadoop, especially when deployed using direct-attached storage (DAS).
  15. One challenge associated with traditional deployments of Hadoop, is that it has largely been done on a dedicated infrastructure and not integrated with or connected to any other applications. In effect, a silo’d environment, often outside the realm of the IT team. This poses a number inefficiencies and risks.<click>A well-recognized issue with traditional Hadoop deployments is the “single-point-of-failure” problem with a the HadoopNamenode. In a Hadoop environment, a single namenode manages the hadoopfilesystem. If it goes down, the Hadoop environment will immediately go off-line. <Click to next build slide>
  16. Another issue with traditional Hadoop environments is the lack of enterprise-level data protection. Typical Hadoop deployments do not have rigorous data protection backup and recovery capabilities such as snapshots or data replication capabilities for disaster recovery (DR) purposes.<click> Traditional Hadoop deployments on direct-attached storage (DAS) are also extremely inefficient. It’s not unusual for a DAS environment to operate with a 30-35% storage utilization rate (or less). Compounding this inefficiency is the fact that data is often mirrored (the default is 3 times). In addition to storage inefficiency, this type of infrastructure is very management-intensive.<click>Another issue with Hadoop running with direct attached storage is that ‘server’ and ‘storage’ resources must be increased together in lock-step. For example, if more storage resources are required, a new server must be deployed (and vice versa). This rigidity adds additional inefficiencies. Another issue is the manual import/export of data that is required in a traditional hadoop environment. In addition to being time and resource (bandwith) consuming, the hadoop data in typical environments can not be accessed or shared with other enterprise applications due to the lack of industry-standard protocol support.To address these challenges and to enable enterprises to begin realizing the benefits of Hadoop quickly and easily, EMC has recently introduced an exciting new Hadoop solution.<click to advance to next slide>
  17. With the new EMC solution which incorporates EMC Isilon Scale-out NAS storage, organizations can deploy Hadoop on a highly scaleable platform that easily leverage other enterprise applications and workflows.<click>
  18. The new EMC solution also eliminates the “single-point-of-failure” issue. We do this by enabling all nodes in an EMC Isilon storage cluster to become, in effect, namenodes. This greatly improves the resiliency of your hadoop environment.The EMC solution for hadoop also provides reliable, end-to-end data protection for Hadoop data including snapshoting for backup and recovery and data replication (with SyncIQ) for disaster recovery capabilities.Our new hadoop solution also takes advantage of the outstanding efficiency of EMC Isilon storage systems. With our solutions, customers can achieve up to 80% or more storage utilization.EMC Hadoop solutions can also scale easily and independently. This means if you need to add more storage capacity, you don’t need to add another server (and vice versa). With EMC isilon, you also get the added benefit of linear increases in performance as the scale increases.EMC also recently announced that we are the 1st vendor to integrate the HDFS (Hadoop Distributed File System) into our storage solutions. This means that with EMC Isilon storage, you can readily use your Hadoop data with other enterprise applications and workloads while eliminating the need to manually move data around as you would with direct-attached storage.
  19. EMC is the industry’s first and only storage vendor to provide native Hadoop integration with scale-out storage. Our solution is designed to a number of key benefits:Our end-to-end approach helps enterprises deploy a proven hadoop solution quickly so that you can begin benefitting from this powerful technology quickly.Our solution eliminates risk and increases data protection.Another advantage of EMC’s Hadoop solution is that we have a significant amount of knowledge and expertise about big data analytics that you can leverage (we’ll cover this in more detail later in the presentation.Now let’s take a closer look at the EMC solution to see how we’re able to deliver on these benefits.
  20. Scale out software architectures make commodity hardware work. You don’t want to be in the hardware business. This is being commoditized.Graphics showing the accelerating growth of the hardware with the go lower price pressures.
  21. Main points:With a shared node based architecture any node can go down and any other node can take over for it; N-way resiliencyIsilon stripes vertically across all nodesIf a drive were to fail we rebuild the data across the available free space of the clusterIsilon can do protection levels unprecedented in the storage industry. N+1 through N+4… quadruple parity protectionCan sustain up to four simultaneous failures (4 drives or 4 nodes)Since each node in the cluster is participating in rebuilding a small piece of the data in parallel we can rebuild lost drives faster than anyone in the industry…. Easily rebuilding a 250GB drive in minutes rather than hours________________________________________________________Example narration:Lets talk a little about reliability. First as a shared node based architecture any node can go down and any other node can take over for it. We call this N-way resiliency. Second we do data protection very uniquely. Take this oil and gas file. A user hits “save” (“Click”) and the file is sent to the cluster and striped vertically across all nodes. Each node takes a small part of the file. It is “distributed” across the entire cluster. We also do this with parity or ECC. If a drive were to fail we rebuild the data across the available free space of the cluster… rather than on some dedicated parity drive or within some RAID group of drives. (“Click”) Moreover we can do protection levels unprecedented in the storage industry. N+2… akin to RAID 6 or RAID DP all the way through N+4… or quadruple parity protection failure. So we can sustain up to four simultaneous failures in our solution and be protected. This is industry leading data protection levels not previously understood in storage but achieved with Isilon. Finally, since each node in the cluster is participating in rebuilding a small piece of the data in parallel we can rebuild lost drives faster than anyone in the industry…. Easily rebuilding a 250GB drive in minutes rather than hours. This minimizes your window of risk when you have failed components.
  22. EMC’s enterprise hadoop solution combines the power of EMCGreenplum HD, EMC’s “Apache Hadoop Distribution”, with EMC Isilon Scale-out NAS storage.The Greenplum HD software, depicted here at the top of the diagram, provides the “Compute” function while the Isilon storage (depicted at the bottom of the diagram) provides the “storage” function in the EMC Hadoop solution. Note that the “Hadoop Distribution File System (HDFS)” is integrated into the OneFS Operating system used by the EMC Isilon storage systems.Together, this solution provides a comprehensive hadoop solution that is easy to implement and manage. It is also highly efficient, reliable and highly scaleable.Our Hadoop solution can also be easily augmented with additional EMC Greenplum technologies to expand your data analytics capabilities (these will be discussed later in the presentation). Now let’s look at how the EMC Hadoop solution is packaged.
  23. EMC’s Hadoop solution is a available in 2 basic configurations:EMC GreenplumHadoop software + EMC Isilon storageAn EMC Hadoop “data computing appliance” + EMC Isilon storage In the 1st solution configuration, the customer provides their own x86 server hardware which is then loaded with Greenplum HD is packaged as software-only. The server then is connected to the EMC Isilon Scale-out NAS.In the 2nd solution configuration, an EMC Greenplum “Data Computing Appliance” (includes an x86 server appliance, pre-loaded with Greenplum HD software) connects to the Isilon scale-out NAS storage platform.Either offering, enterprises can deploy and implement a comprehensive hadoop solution quickly and easily. Now let’s look at the underlying software architecture of the solution with out “Data Computing Appliance”.
  24. This slide illustrates the architecture of EMC’s enterprise Hadoop solution based on our Greenplum “Data Computing Appliance (DCA)”.Starting at the bottom, you’ll note that the solution incorporates EMC Isilon storage which connects to our DCA with the HDFS protocol. Within the DCA, you’ll note: the Pluggable Storage LayerThe MapReduce Layer of Hadoop (which provides the “Compute” function).Standardhadoop tools such as “Pig” and “Hive”Advanced tools through Greenplum Chorus (which will be described in more detail in a few minutes).This solution provides a number of advantages over traditional Hadoop deployments:Easier and more reliable: EMC’s end-to-end approach removes the pain associated with building out a Hadoop cluster from scratch, which is required with other distributions. A purpose-built Hadoop infrastructure: Enterprises can deploy a Hadoop cluster quickly while eliminating the risk associated with the typical hardware and software configuration process.A key component of a unified analytics platform: The Hadoop solution of Greenplum HD is a core component of Greenplum’s Unified Analytics Platform, which is designed to answer the Big Data analytics needs of the agile enterprise by delivering business value through analytical insights.As a packaged and supported solution from EMC, you can also take advantage of the EMC's extensive support and services:Enterprise Hadoop support:Rely on EMC to provide 24x7 worldwide support with the industry’s largest Hadoop support infrastructure. Proven at scale: Certified by EMC to remove the guesswork associated with Hadoop deployments.Now, let me introduce my colleague from EMC’s Greenplum team to describe additional ways we can help you address your Big Data analytics needs.
  25. Greenplum is working with an amazing group of customers to help them pursue business value from Analytics and participate in this era of Big Data. These industry leaders and innovative thinkers are doing extraordinary things with our platform. As you can see we are working with companies in many industries and verticals. Everything from Finance, to retail, to telecom to internet. Regardless of the sector, companies using Greenplum are innovating in new ways.
  26. Our expansive partner network ensures you protect your existing investments while having the opportunity to leverage the best available technology. Greenplum has deep partnerships with industry leading organizations such as the SAS institute, Microstrategy and Informattica. We are also working with the emerging partners including karmasphere, datameer and predixion who doing new and interesting things on Hadoop and big data. Finally, we are fortunate to work with a number of leading applications providers like Silverspring networks and Clickfox who leverage Greenplum as a powerful backend technology. Greenplum is proud to work with this extraordinary partner ecosystem.
  27. You have heard us say Greenplum not just a database, but guess what, it’s also, Greenplum, not just about technology.Data science teams are an emerging practice that are making amazing things happen on big data on behalf of their organizations. Greenplum is committed to the future of data science. We are working with leading universities on developing data science curriculums and programs.And we are investing in the community. We recently announced, with the help of several partners, a 1000 node Hadoop cluster called the Greenplum Analytic workbench for Hadoop. The only one of its kind in the industry. We will always have community editions of our software available for free. And we continue to invest in the practice by creating an publicizing events like the Data Science Summits.We also have our own data scientist practice with PHDs that have expertise in leading analytic tools. This team works every day with our customers advancing their projects and enabling new things from data.