SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
HADOOP
                                            OVERVIEW


                                               Milind Bhandarkar,
                                               Chief Architect, CTO Office, Greenplum

                                               Will Davis
                                               Senior Manager, Product Marketing, Greenplum




© Copyright 2012 EMC Corporation. All rights reserved.                                        1
Agenda
 Hadoop – what’s the big deal?
 Evolution of Hadoop from Web 2.0 to
  Enterprise adoption
 Deployment considerations for Enterprises
        – Enterprise storage
        – Integration into architecture and analytics
          workflow
        – Training/support resources
 How Greenplum HD is Hadoop built for the
  enterprise

© Copyright 2012 EMC Corporation. All rights reserved.   2
Background of Hadoop




© Copyright 2012 EMC Corporation. All rights reserved.   3
What is Hadoop
 Framework that allows for distributed
  processing of large data sets across
  clusters of commodity servers
        – Store large amount of data
        – Process the large amount of data
          stored
 Inspired by Google’s MapReduce and
  Google File System (GFS) papers
 Apache Open Source Project
        – Initial work done at Yahoo!
        – Very active open source community


© Copyright 2012 EMC Corporation. All rights reserved.   4
The Hadoop Opportunity
 Internet age + exploding data growth
 Enterprises increasingly interested in
  leveraging new data sources quickly:
        – Spot emerging trends
        – Identify new opportunities, etc.
 Traditional database tools not able to cope
        – Weren’t built for big data use cases
        – Lack scale, not cost-effective, rigid data structure
 Need for new approach  Hadoop


© Copyright 2012 EMC Corporation. All rights reserved.   EMC Confidential   5
Why Hadoop is Important?
 Handles large amounts of data
 Stores data in native format
 Delivers linear scalability at low cost
 Resilient in case of infrastructure failures
 Transparent application scalability




© Copyright 2012 EMC Corporation. All rights reserved.   6
Why Hadoop is Important?
 Handles large amounts of data
 Stores data in native format
 Delivers linear scalability at low cost
 Resilient in case of infrastructure failures
 Transparent application scalability

             Enterprises can gain a competitive
             advantage through the adoption of
                      big data analytics


© Copyright 2012 EMC Corporation. All rights reserved.   7
What is Hadoop?
                                              Two Core Components

                               HDFS                            MapReduce

                  Scalable storage in                        Compute via the
                  Hadoop Distribued                        MapReduce distributed
                     File System                            processing platform



• Storage & Compute in 1 Framework
• Open Source Project of the Apache Software
  Foundation
• Java-intensive programming required


© Copyright 2012 EMC Corporation. All rights reserved.                             8
Hadoop Architecture
 1. Data is ingested into the Hadoop File System (HDFS)
 2. Computation occurs inside Hadoop (MapReduce)
 3. Results are exported from HDFS for use




       Hadoop Data Node                       Hadoop Data Node   Hadoop Data Node

                                                                                    Hadoop Data
    Ethernet
                                                                                       Node




       Hadoop Data Node                       Hadoop Data Node   Hadoop Data Node




© Copyright 2012 EMC Corporation. All rights reserved.                                            9
Hadoop Components
    Spring Hadoop                                        •Integrates Spring and Hadoop Frameworks


                Mahout                                   •Scalable machine learning libraries


                  HBase                                  •Database for random, real time read/write access


                     Hive                                •System for SQL-like query data on top of HDFS


                       Pig                               •Procedural language that abstracts MapReduce


           Zookeeper                                     •Highly reliable distributed coordination


          MapReduce                                      •Framework for writing scalable data applications


                   HDFS                                  •Hadoop Distributed File System




© Copyright 2012 EMC Corporation. All rights reserved.                                                       10
Hadoop Use Case Examples
 Scale-out content                                       Personalization and
  management & data                                        asset management
  repository                                               analysis
 Batch processing of                                     Trade analytics
  heterogeneous data ETL                                  Credit scoring
  (Extract/Transform/Load
  )                                                       Customer retention
 Pre-processing and                                      Sentiment analysis
  integration with data                                    (opinion mining)
  warehouse



© Copyright 2012 EMC Corporation. All rights reserved.                           12
Evolution of Hadoop
                            From Web 2.0 to
                                Enterprise




© Copyright 2012 EMC Corporation. All rights reserved.   13
Web 2.0 Organizations are
“Data-Driven”




     “The future is here, it’s just not evenly distributed yet.”
                                                         –WILLIAM GIBSON




© Copyright 2012 EMC Corporation. All rights reserved.                     14
Technology Adoption Lifecycle




           Innovators/                          Early Majority   Late Majority   Laggards
          Early Adopters




© Copyright 2012 EMC Corporation. All rights reserved.                                      15
Evolution of the Hadoop Market




           Innovators/                          Early Majority   Late Majority      Laggards
          Early Adopters




                 Hadoop Early Adopters                                Hadoop Early Majority



© Copyright 2012 EMC Corporation. All rights reserved.                                         16
Evolution of the Hadoop Market
         HADOOP PROFILE (TODAY)




                      Pioneers and academics
                      Application Architect
                      Visionary


                      Open source / community driven
                      Build-your-own server, application
                      & storage infrastructure
                      Commodity components


                      Web 2.0
                      Universities
                      Life Sciences




                 Hadoop Early Adopters                     Hadoop Early Majority




© Copyright 2012 EMC Corporation. All rights reserved.                             17
Evolution of the Hadoop Market
         HADOOP PROFILE (TODAY)                              HADOOP PROFILE (FUTURE)




                      Pioneers and academics               IT Manager & CIO
                      Application Architect                Data Scientist
                      Visionary                            Line-of-business


                      Open source / community driven       Commercial distribution
                      Build-your-own server, application   Turnkey solution
                      & storage infrastructure
                                                           End-to-End Data protection
                      Commodity components


                      Web 2.0                              Fortune 1000
                      Universities                         Financial Services
                      Life Sciences                        Retail




                 Hadoop Early Adopters                                Hadoop Early Majority




© Copyright 2012 EMC Corporation. All rights reserved.                                        18
2012: Hadoop Beyond Web 2.0




© Copyright 2012 EMC Corporation. All rights reserved.   19
Greenplum HD:
           Hadoop for the Enterprise




© Copyright 2012 EMC Corporation. All rights reserved.   20
Hadoop Challenges in the Enterprise

                                            Hadoop is hard right now!
                                                    – Setup & configuration is resource-intensive
                                                    – Lack of skills to make Hadoop work
                                                    – Poor integration with existing technologies
                                                    – Management at Scale is nonexistent
                                                    – Backup & disaster recovery missing




© Copyright 2012 EMC Corporation. All rights reserved.                                              21
Greenplum HD Enterprise-Ready Hadoop

                                             Simple, efficient and scalable
                                             Proven at scale with worldwide
                                              EMC support
                                             Purpose-built Hadoop
                                              infrastructure
                                             Services to address the talent gap
                                             Parallel analytics access with
                                              Greenplum Database




© Copyright 2012 EMC Corporation. All rights reserved.                             22
Greenplum HD Architecture

                                                         Greenplum Chorus
          GREENPLUM COMMAND CENTER




                                     Hadoop Tools (Pig, Hive, HBase, Zookeeper, Mahout,
                                                            etc…)


                                                         MapReduce Layer


                                            Pluggable Storage Layer (HDFS API)


                                     Apache HDFS                  Isilon OneFS




© Copyright 2012 EMC Corporation. All rights reserved.                                    23
Enterprise Storage for Hadoop
                                                 Integrated big data storage and analytic
                                                  solution based on Greenplum HD and
                                                  Isilon scale-out NAS
Compute




                                                 Isilon is 1st and only enterprise scale out
                                                  NAS storage platform that natively
                                                  integrates the Hadoop Distributed File
                                                  System (HDFS) protocol

                                                 Seamless analytics access with
Storage




                                                  Greenplum - Hadoop insights directly
                                                  plug into Greenplum Database to
                                                  augment analytics



© Copyright 2012 EMC Corporation. All rights reserved.   EMC Confidential                       24
Flexible and Efficient




 Independently Scale Compute & Storage
       – Add Greenplum HD or Isilon nodes for performance or
         capacity
 Eliminate 3x copies of data in HDFS
       – Isilon enables 80% utilization for greater storage efficiency
 Seamless Analytics Access with Greenplum Database
       – Hadoop Fused with GPDB for Big Data analytics


© Copyright 2012 EMC Corporation. All rights reserved.                   25
Simplified Deployment

                                        Remove the need for data staging
                                                – Isilon enables data access over
                                                  standard protocols (NFS, CIFS, FTP,
                                                  HTTP, HDFS)
                                        No single point of failure
                                                – Isilon distributes the NameNode to
                                                  provide high availability and load
                                                  balancing
                                        Enterprise data services for Hadoop
                                                – Advanced backup and disaster
                                                  recovery capabilities



© Copyright 2012 EMC Corporation. All rights reserved.                                  26
Advanced Management
 Greenplum Command Center
        – Complete platform management and control
 Greenplum Package Manager
        – Automates install, uninstall, update, and query for analytics
          extensions
        – Support package migration during upgrade, segment recovery,
          expansion, and standby initialization




© Copyright 2012 EMC Corporation. All rights reserved.                    27
Proven at Scale with Worldwide Support
                                                          Industries largest Hadoop
                                                           support team
                                                            – Industry’s most accomplished
                                                              Hadoop talents (from Yahoo!,
                                                              LinkedIn, Talend, etc.)

                                                          Tested at scale on the
                                                           Greenplum Analytics Workbench
                                                            – 1,000-node, 24-petabyte cluster
                                                            – Multi-million dollar investment by
        Bringing Rapid                                        EMC and partners
     Innovation to Hadoop                                   – Reduced risk for EMC customers
                                                            – Certification of partner products



© Copyright 2012 EMC Corporation. All rights reserved.                                             28
Get Started With Hadoop Today
                                                Hadoop Architecture Services
                                                         – POC planning and deployment
                                                         – Installation and best practices
                                                         – Educate the team
                                                Greenplum Analytics Labs
                                                         – Leverage the expertise of Greenplum’s
                                                           Data Scientists
                                                         – Packaged solutions that produce business
                                                           value and actionable results
                                                         – Accelerate Hadoop capabilities on your
                                                           data with your analysts
                                                Establish a strategic vision
                                                         – Roadmap for Hadoop and unified analytics



© Copyright 2012 EMC Corporation. All rights reserved.                                                29
Provide Feedback & Win!


                                                          125 attendees will receive
                                                           $100 iTunes gift cards. To
                                                           enter the raffle, simply
                                                           complete:
                                                            – 5 sessions surveys
                                                            – The conference survey
                                                          Download the EMC World
                                                           Conference App to learn
                                                           more: emcworld.com/app



© Copyright 2012 EMC Corporation. All rights reserved.                                  30
© Copyright 2012 EMC Corporation. All rights reserved.   31
Thank You




© Copyright 2012 EMC Corporation. All rights reserved.        32
Hadoop Overview

Contenu connexe

Tendances

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Tendances (20)

Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
An Introduction to the World of Hadoop
An Introduction to the World of HadoopAn Introduction to the World of Hadoop
An Introduction to the World of Hadoop
 

En vedette

Tues factors of production
Tues factors of productionTues factors of production
Tues factors of production
Travis Klein
 
บทที่ 4
บทที่ 4บทที่ 4
บทที่ 4
einscream
 
Linux kursu-erzurum
Linux kursu-erzurumLinux kursu-erzurum
Linux kursu-erzurum
sersld67
 
Sato all in-one-training
Sato all in-one-trainingSato all in-one-training
Sato all in-one-training
PROFIBLOG
 

En vedette (16)

Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari Overview2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari Overview
 
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
 
Propel London - Digital Recruitment
Propel London - Digital Recruitment Propel London - Digital Recruitment
Propel London - Digital Recruitment
 
Tues factors of production
Tues factors of productionTues factors of production
Tues factors of production
 
Atlassian GreenHopper
Atlassian GreenHopperAtlassian GreenHopper
Atlassian GreenHopper
 
บทที่ 4
บทที่ 4บทที่ 4
บทที่ 4
 
Advertising wed
Advertising wedAdvertising wed
Advertising wed
 
Linux kursu-erzurum
Linux kursu-erzurumLinux kursu-erzurum
Linux kursu-erzurum
 
Sato all in-one-training
Sato all in-one-trainingSato all in-one-training
Sato all in-one-training
 
Hip to save $$$$
Hip to save $$$$Hip to save $$$$
Hip to save $$$$
 
Toilet cleaner
Toilet cleanerToilet cleaner
Toilet cleaner
 
IDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudIDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private Cloud
 
Dona
DonaDona
Dona
 
What photosensitive epilepsy
What photosensitive epilepsyWhat photosensitive epilepsy
What photosensitive epilepsy
 
Amelia earhart
Amelia earhartAmelia earhart
Amelia earhart
 

Similaire à Hadoop Overview

The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
m_hepburn
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
Hortonworks
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
Cloudera, Inc.
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
russell_jurney
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
DataWorks Summit
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
Roby Chen
 

Similaire à Hadoop Overview (20)

The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on Hadoop
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystem
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 

Plus de EMC

Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
EMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
EMC
 

Plus de EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Hadoop Overview

  • 1. HADOOP OVERVIEW Milind Bhandarkar, Chief Architect, CTO Office, Greenplum Will Davis Senior Manager, Product Marketing, Greenplum © Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2. Agenda  Hadoop – what’s the big deal?  Evolution of Hadoop from Web 2.0 to Enterprise adoption  Deployment considerations for Enterprises – Enterprise storage – Integration into architecture and analytics workflow – Training/support resources  How Greenplum HD is Hadoop built for the enterprise © Copyright 2012 EMC Corporation. All rights reserved. 2
  • 3. Background of Hadoop © Copyright 2012 EMC Corporation. All rights reserved. 3
  • 4. What is Hadoop  Framework that allows for distributed processing of large data sets across clusters of commodity servers – Store large amount of data – Process the large amount of data stored  Inspired by Google’s MapReduce and Google File System (GFS) papers  Apache Open Source Project – Initial work done at Yahoo! – Very active open source community © Copyright 2012 EMC Corporation. All rights reserved. 4
  • 5. The Hadoop Opportunity  Internet age + exploding data growth  Enterprises increasingly interested in leveraging new data sources quickly: – Spot emerging trends – Identify new opportunities, etc.  Traditional database tools not able to cope – Weren’t built for big data use cases – Lack scale, not cost-effective, rigid data structure  Need for new approach  Hadoop © Copyright 2012 EMC Corporation. All rights reserved. EMC Confidential 5
  • 6. Why Hadoop is Important?  Handles large amounts of data  Stores data in native format  Delivers linear scalability at low cost  Resilient in case of infrastructure failures  Transparent application scalability © Copyright 2012 EMC Corporation. All rights reserved. 6
  • 7. Why Hadoop is Important?  Handles large amounts of data  Stores data in native format  Delivers linear scalability at low cost  Resilient in case of infrastructure failures  Transparent application scalability Enterprises can gain a competitive advantage through the adoption of big data analytics © Copyright 2012 EMC Corporation. All rights reserved. 7
  • 8. What is Hadoop? Two Core Components HDFS MapReduce Scalable storage in Compute via the Hadoop Distribued MapReduce distributed File System processing platform • Storage & Compute in 1 Framework • Open Source Project of the Apache Software Foundation • Java-intensive programming required © Copyright 2012 EMC Corporation. All rights reserved. 8
  • 9. Hadoop Architecture 1. Data is ingested into the Hadoop File System (HDFS) 2. Computation occurs inside Hadoop (MapReduce) 3. Results are exported from HDFS for use Hadoop Data Node Hadoop Data Node Hadoop Data Node Hadoop Data Ethernet Node Hadoop Data Node Hadoop Data Node Hadoop Data Node © Copyright 2012 EMC Corporation. All rights reserved. 9
  • 10. Hadoop Components Spring Hadoop •Integrates Spring and Hadoop Frameworks Mahout •Scalable machine learning libraries HBase •Database for random, real time read/write access Hive •System for SQL-like query data on top of HDFS Pig •Procedural language that abstracts MapReduce Zookeeper •Highly reliable distributed coordination MapReduce •Framework for writing scalable data applications HDFS •Hadoop Distributed File System © Copyright 2012 EMC Corporation. All rights reserved. 10
  • 11. Hadoop Use Case Examples  Scale-out content  Personalization and management & data asset management repository analysis  Batch processing of  Trade analytics heterogeneous data ETL  Credit scoring (Extract/Transform/Load )  Customer retention  Pre-processing and  Sentiment analysis integration with data (opinion mining) warehouse © Copyright 2012 EMC Corporation. All rights reserved. 12
  • 12. Evolution of Hadoop From Web 2.0 to Enterprise © Copyright 2012 EMC Corporation. All rights reserved. 13
  • 13. Web 2.0 Organizations are “Data-Driven” “The future is here, it’s just not evenly distributed yet.” –WILLIAM GIBSON © Copyright 2012 EMC Corporation. All rights reserved. 14
  • 14. Technology Adoption Lifecycle Innovators/ Early Majority Late Majority Laggards Early Adopters © Copyright 2012 EMC Corporation. All rights reserved. 15
  • 15. Evolution of the Hadoop Market Innovators/ Early Majority Late Majority Laggards Early Adopters Hadoop Early Adopters Hadoop Early Majority © Copyright 2012 EMC Corporation. All rights reserved. 16
  • 16. Evolution of the Hadoop Market HADOOP PROFILE (TODAY) Pioneers and academics Application Architect Visionary Open source / community driven Build-your-own server, application & storage infrastructure Commodity components Web 2.0 Universities Life Sciences Hadoop Early Adopters Hadoop Early Majority © Copyright 2012 EMC Corporation. All rights reserved. 17
  • 17. Evolution of the Hadoop Market HADOOP PROFILE (TODAY) HADOOP PROFILE (FUTURE) Pioneers and academics IT Manager & CIO Application Architect Data Scientist Visionary Line-of-business Open source / community driven Commercial distribution Build-your-own server, application Turnkey solution & storage infrastructure End-to-End Data protection Commodity components Web 2.0 Fortune 1000 Universities Financial Services Life Sciences Retail Hadoop Early Adopters Hadoop Early Majority © Copyright 2012 EMC Corporation. All rights reserved. 18
  • 18. 2012: Hadoop Beyond Web 2.0 © Copyright 2012 EMC Corporation. All rights reserved. 19
  • 19. Greenplum HD: Hadoop for the Enterprise © Copyright 2012 EMC Corporation. All rights reserved. 20
  • 20. Hadoop Challenges in the Enterprise  Hadoop is hard right now! – Setup & configuration is resource-intensive – Lack of skills to make Hadoop work – Poor integration with existing technologies – Management at Scale is nonexistent – Backup & disaster recovery missing © Copyright 2012 EMC Corporation. All rights reserved. 21
  • 21. Greenplum HD Enterprise-Ready Hadoop  Simple, efficient and scalable  Proven at scale with worldwide EMC support  Purpose-built Hadoop infrastructure  Services to address the talent gap  Parallel analytics access with Greenplum Database © Copyright 2012 EMC Corporation. All rights reserved. 22
  • 22. Greenplum HD Architecture Greenplum Chorus GREENPLUM COMMAND CENTER Hadoop Tools (Pig, Hive, HBase, Zookeeper, Mahout, etc…) MapReduce Layer Pluggable Storage Layer (HDFS API) Apache HDFS Isilon OneFS © Copyright 2012 EMC Corporation. All rights reserved. 23
  • 23. Enterprise Storage for Hadoop  Integrated big data storage and analytic solution based on Greenplum HD and Isilon scale-out NAS Compute  Isilon is 1st and only enterprise scale out NAS storage platform that natively integrates the Hadoop Distributed File System (HDFS) protocol  Seamless analytics access with Storage Greenplum - Hadoop insights directly plug into Greenplum Database to augment analytics © Copyright 2012 EMC Corporation. All rights reserved. EMC Confidential 24
  • 24. Flexible and Efficient  Independently Scale Compute & Storage – Add Greenplum HD or Isilon nodes for performance or capacity  Eliminate 3x copies of data in HDFS – Isilon enables 80% utilization for greater storage efficiency  Seamless Analytics Access with Greenplum Database – Hadoop Fused with GPDB for Big Data analytics © Copyright 2012 EMC Corporation. All rights reserved. 25
  • 25. Simplified Deployment  Remove the need for data staging – Isilon enables data access over standard protocols (NFS, CIFS, FTP, HTTP, HDFS)  No single point of failure – Isilon distributes the NameNode to provide high availability and load balancing  Enterprise data services for Hadoop – Advanced backup and disaster recovery capabilities © Copyright 2012 EMC Corporation. All rights reserved. 26
  • 26. Advanced Management  Greenplum Command Center – Complete platform management and control  Greenplum Package Manager – Automates install, uninstall, update, and query for analytics extensions – Support package migration during upgrade, segment recovery, expansion, and standby initialization © Copyright 2012 EMC Corporation. All rights reserved. 27
  • 27. Proven at Scale with Worldwide Support  Industries largest Hadoop support team – Industry’s most accomplished Hadoop talents (from Yahoo!, LinkedIn, Talend, etc.)  Tested at scale on the Greenplum Analytics Workbench – 1,000-node, 24-petabyte cluster – Multi-million dollar investment by Bringing Rapid EMC and partners Innovation to Hadoop – Reduced risk for EMC customers – Certification of partner products © Copyright 2012 EMC Corporation. All rights reserved. 28
  • 28. Get Started With Hadoop Today  Hadoop Architecture Services – POC planning and deployment – Installation and best practices – Educate the team  Greenplum Analytics Labs – Leverage the expertise of Greenplum’s Data Scientists – Packaged solutions that produce business value and actionable results – Accelerate Hadoop capabilities on your data with your analysts  Establish a strategic vision – Roadmap for Hadoop and unified analytics © Copyright 2012 EMC Corporation. All rights reserved. 29
  • 29. Provide Feedback & Win!  125 attendees will receive $100 iTunes gift cards. To enter the raffle, simply complete: – 5 sessions surveys – The conference survey  Download the EMC World Conference App to learn more: emcworld.com/app © Copyright 2012 EMC Corporation. All rights reserved. 30
  • 30. © Copyright 2012 EMC Corporation. All rights reserved. 31
  • 31. Thank You © Copyright 2012 EMC Corporation. All rights reserved. 32