SlideShare une entreprise Scribd logo
1  sur  19
Hadoop Operations &
Enterprise Readiness
HDP 1.2
Jim Walker
Jeff Sposetti




© Hortonworks Inc. 2013   Page 1
Hortonworks Snapshot

                                      We develop, distribute and support
                                      the ONLY 100% open source
                                      Enterprise Hadoop distribution


Develop                                  Distribute                      Support
• We employ the core                • We distribute the only 100%   • We are uniquely positioned
  architects, builders and            Open Source Enterprise          to deliver the highest quality
  operators of Apache Hadoop          Hadoop Distribution:            of Hadoop support
                                      Hortonworks Data
• We drive innovation within          Platform                      • We enable the ecosystem to
  Apache Software                                                     work better with Hadoop
  Foundation projects               • We engineer, test & certify
                                      HDP for enterprise usage

Endorsed by Strategic Partners




                                                                                                 Page 2
          © Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Upstream Community Projects                                      Downstream Enterprise Product
      Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
                                                                                         Integrate
                                                                                           & Test

                                                        Fixed Issues


  Apache                                                    Design &
    Pig          Test &
                 Patch                                      Develop
                                      Apache    Release                                                            Package
                                      Hadoop                                                                       & Certify
           Apache                                  Stable Project                Hortonworks
            Hive                                   Releases
                              Design & Develop                                   Data Platform

                                Apache
 Apache                         HCatalo
 HBase                            g
                                                                             Distribute
                                               Apache
               Other                           Ambari
              Apache
              Projects                                         No Lock-in: Integrated, tested & certified distribution lowers
                                                                 risk by ensuring close alignment with Apache projects


                                                                                                                   Page 3
            © Hortonworks Inc. 2013
Hortonworks Data Platform 1.2
• Quarterly cadence
  – HDP is aligned tightly with the open source community
    software releases, not a patchwork
  – Regular open source innovation based on an open
    community


• Ecosystem validation
  – Packaged and tested with our key development partner,
    Yahoo! across hundreds of nodes
  – Ambari is the preferred management tool for integration with
    of Microsoft System Center and Teradata Viewpoint, today.




                                                                   Page 4
     © Hortonworks Inc. 2013
HDP 1.2 Summary
Hortonworks Data Platform 1.2
Hortonworks Data Platform outpaces the competition to extend
leadership through 100% open source Enterprise Apache Hadoop

Focus areas:
 1. Ambari: continued innovation with a complete,
    free and open cluster management tool
      •        Existing: Provision, Manage and Monitor your Hadoop infrastructure
      •        New: Root Cause Analysis with job diagnostics, usage heat maps,
      •        Improved: Ecosystem integration and user interface
 2. Enhanced security model and performance
    for Hive and HCatalog
 3. Apache Mahout: now included in the HDP distribution


                                                                              Page 5
          © Hortonworks Inc. 2013
HDP Certifies Latest Stable Components

  Apache                       HDP                           CDH                           CDH
  Project                       1.2                          3u5                           4.1.2
  Hadoop                       1.1.2                020.2 +923.418               2.0.0alpha +541
     Pig                       0.10.1                   0.8.1 +51.39                   0.10.0 +48
    Hive                       0.10.0                   0.7.1 +42.56                   0.9.0 +148
  HCatalog                     0.5.0                          n/a                            n/a
   HBase                       0.94.2                  0.90.6 +84.73                  0.92.1 +154
   Sqoop                       1.4.2                     1.3.0 +5.88                    1.4.1 +51
    Oozie                      3.2.0                         3.2.0                         3.2.0
  Zookeeper                    3.4.5                     3.3.5 +19.5                    3.4.3 +25
   Ambari                      1.2.0                          n/a                            n/a
   Flume                       1.3.0                    0.9.4 +25.46                   1.2.0 +119
   Mahout                      0.7.0                       0.5 +9.7                       0.7 +4


                                        Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf

                                                                                                      Page 6
     © Hortonworks Inc. 2013
A Brief History of Apache Hadoop

                 Apache Project        Yahoo! begins to            Hortonworks
                  Established          Operate at scale            Data Platform

                                                                                             2013
   2004                   2006           2008             2010            2012            Enterprise
                                                                                           Hadoop
2005: Yahoo! creates
 team under E14 to                                             Focus on INNOVATION
  work on Hadoop

                          2008: Yahoo team extends focus to
                            operations to support multiple    Focus on OPERATIONS
                             projects & growing clusters


                                      2011: Hortonworks created to focus on
                                       “Enterprise Hadoop“. Starts with 24    STABILITY
                                        key Hadoop engineers from Yahoo



                                                                                               Page 7
           © Hortonworks Inc. 2013
HDP: Enterprise Hadoop Distribution

 OPERATIONAL                              DATA             Hortonworks
   SERVICES                             SERVICES
                                                           Data Platform (HDP)
   Manage &                               Store,
   Operate at                          Process and         Enterprise Hadoop
     Scale                             Access Data

                                                           • The ONLY 100% open source
  HADOOP CORE
                                Distributed                  and complete distribution
                                Storage & Processing


  PLATFORM SERVICES                 Enterprise Readiness   • Enterprise grade, proven and
                                                             tested at scale
                                HORTONWORKS
                                DATA PLATFORM (HDP)        • Ecosystem endorsed to
                                                             ensure interoperability




                                                                                       Page 8
      © Hortonworks Inc. 2013
Next-Generation Data Architecture
APPLICATIONS




                    Business                       Custom                   Enterprise
                    Analytics                    Applications              Applications
                                                                                                      DEV & DATA
                                                                                                        TOOLS
                                                                                                        BUILD &
                                                                                                         TEST
DATA SYSTEMS




                                                                                                      OPERATIONAL
                                                                                                         TOOLS
                                                                          HORTONWORKS                  MANAGE &
                                                                          DATA PLATFORM                MONITOR
                 RDBMS       EDW           MPP
                       TRADITIONAL REPOS
DATA SOURCES




                   Traditional Sources                             New Sources
                OLTP, (RDBMS, OLTP, OLAP)             (web logs, email, sensor data, social media)
                                                                                             MOBILE
                 POS                                                                         DATA
               SYSTEMS




                                                                                                                    Page 9
                     © Hortonworks Inc. 2013
HDP 1.2: Operational Services Improvements

 OPERATIONAL                               DATA                            Apache Ambari 1.2
   SERVICES                              SERVICES                          Hortonworks open source
                                                                           approach continues to accelerate
   Manage &
    AMBARI                                 Store,
   Operate at                           Process and                        enterprise adoption of Hadoop
     Scale                              Access Data
     OOZIE
                                                                             – Open Source Approach
                                                                               The only 100% open source Apache
                                Distributed                                    Hadoop cluster management tool
  HADOOP CORE                   Storage & Processing
                                                                             – Baseline Features
                                   Enterprise Readiness                        Delivers all necessary tools/functions
  PLATFORM SERVICES                High Availability, Disaster Recovery,       to provision, manage and monitor a
                                   Snapshots, Security, etc…
                                                                               Apache Hadoop cluster

                                HORTONWORKS                                  – Innovation
                                                                               Provides ability to zoom into cluster
                                DATA PLATFORM (HDP)                            usage and performance metrics for
                                                                               jobs and tasks to identify root cause of
                                                                               bottlenecks or operations issues
                                                                             – Interoperable
                                                                               Includes APIs for integrating with
                                                                               Microsoft System Center, Teradata
                                                                               Viewpoint, and other systems

      © Hortonworks Inc. 2013
                                                                           Also Upgraded Oozie & Zookeeper 10
                                                                                                        Page
HDP 1.2: New Ambari Features
                                     • Job Diagnostics
                                       Visualize and troubleshoot Hadoop
                                       job execution and performance

                                     • Cluster History
                                       View historical job execution &
                                       performance

                                     • REST interface
                                       provides external access to Ambari
                                       for existing tools. Facilitates
                                       integration with Microsoft System
                                       Center and Teradata Viewpoint

                                     • Instant Insight
                                       View health of Core Hadoop
                                       (HDFS, MapReduce) and related
                                       projects

                                     • Cluster Navigation
Apache Ambari Dashboard                “Quick link” buttons jump into
                                       namenode web UI for a server




                                                                         Page 11
           © Hortonworks Inc. 2013
Demo




                            Page 12
  © Hortonworks Inc. 2013
HDP 1.2: Platform Service Improvements

 OPERATIONAL                               DATA                            Security
   SERVICES                              SERVICES                          Extend platform services for
                                                                           security, a KEY requirement for
   Manage &                               Store,
   Operate at                          Process and                         enterprise adoption of Hadoop
     Scale                             Access Data
                                                                             – Enhanced security architecture &
                                                                               pluggable authentication model
                                                                               controls access to Hive tables and
                                Distributed
 HADOOP CORE                    Storage & Processing
                                                                               metastore
                                                                             – Aligns and improves Hive & HCatalog
                                   Enterprise Readiness
 PLATFORM SERVICES                 High Availability, Disaster Recovery,
                                                                               authentication models
                                   Snapshots, Security, etc…



                                HORTONWORKS                                High Availability
                                DATA PLATFORM (HDP)
                                                                           Full stack HA on Hadoop 1.0
                                                                             – Extended HA to Hive & HCatalog
                                                                               Metastore




                                                                                                             Page 13
      © Hortonworks Inc. 2013
HDP 1.2: Data Services Improvements
                                                                            Data Services Updates
 OPERATIONAL                                DATA
   SERVICES                               SERVICES                            – Upgraded Pig, and Flume

                                FLUME
                                        PIG     HIVE                          – Added Mahout (0.7.0) to distribution
   Manage &                              Store,
   Operate at                             MAHOUT
                                      Process and                HBASE
     Scale                      SQOOP Access Data
                                         HCATALOG                           Hive, HCatalog & HBase
                                                                            Continue to innovate & improve the data
                                Distributed                                 services with open source contributions
 HADOOP CORE                    Storage & Processing                        to HCatalog, Hive and HBase

                                    Enterprise Readiness                      – Concurrency improvements for Hive
 PLATFORM SERVICES                  High Availability, Disaster Recovery,       and consistent security for Hive &
                                    Snapshots, Security, etc…
                                                                                HCatalog

                                HORTONWORKS                                   – Performance and operational
                                                                                enhancements for HBase
                                DATA PLATFORM (HDP)
                                                                              – Improved Java developer productivity
                                                                                via certified Cascading framework




                                                                                                              Page 14
      © Hortonworks Inc. 2013
Page 15
© Hortonworks Inc. 2013
Page 16
© Hortonworks Inc. 2013
Apache Community Leadership
     Apache
                                                                     Apache Software Foundation
       Pig          Test &                                           Guiding Principles
                    Patch                              Release
                                         Apache                      • Release early & often
                                         Hadoop
              Apache                                                 • Transparency, respect, meritocracy
               Hive
                                Design & Develop

                                   Apache
                                                                     Key Roles held by Hortonworkers
   Apache
   HBase
                                   HCatalo
                                     g
                                                                     • PMC Members
                                                                        – Managing community projects
                                                      Apache
                                                      Ambari
                                                                        – Mentoring new incubator projects
                   Other
                  Apache                                                – About 20 Hortonworkers managing community
                  Projects

                                                                     • Committers
                                                                        – Authoring, reviewing & editing code
                                                                        – About 50 Hortonworkers across projects
“We have noticed more activity over the last year
 from Hortonworks’ engineers on building out
 Apache Hadoop’s more innovative features. These                     • Release Managers
 include YARN, Ambari and HCatalog..”                                   – Testing & releasing projects
                                                                        – Hortonworkers across key projects like Hadoop,
                                             - Jeff Kelly: Wikibon        Hive, Pig, HCatalog, Ambari, HBase

                                                                                                                   Page 17
               © Hortonworks Inc. 2013
True Enterprise Class Open Source
• 100% Open Source. No Holdbacks.
  – Only true implementation of OSS Apache Hadoop
  – Preferred by the software vendors that you rely on


• Flexible Deployment
  – No License Fee for usage


• Community Open Source Mitigates Lock-In
  – Proprietary Open Source = Lock-In
  – Open communities always trump “open source”




                                                         Page 18
      © Hortonworks Inc. 2013
THANK YOU!!
                             Download Hortonworks Sandbox
                             www.hortonworks.com/sandbox


                             Download Hortonworks Data Platform
                             www.hortonworks.com/download



                             Register for Enterprise Hadoop Series
                             www.hortonworks.com/webinars



                                              @hortonworks
                             Follow US!       @jaymce
                                              @jsposetti

                                                                     Page 19
   © Hortonworks Inc. 2013

Contenu connexe

Tendances

Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopHortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Hortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 

Tendances (20)

Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 

En vedette

Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInDataWorks Summit
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerIMC Institute
 

En vedette (10)

Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 

Similaire à Hadoop Operations & Enterprise Readiness for HDP 1.2

Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera, Inc.
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultNETWAYS
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and BeyondDataWorks Summit
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
Amr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY PresentationAmr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY Presentation500 Startups
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 
Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop Cloudera, Inc.
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Hortonworks
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 

Similaire à Hadoop Operations & Enterprise Readiness for HDP 1.2 (20)

Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier Renault
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
Amr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY PresentationAmr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY Presentation
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 
Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 

Plus de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Plus de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Hadoop Operations & Enterprise Readiness for HDP 1.2

  • 1. Hadoop Operations & Enterprise Readiness HDP 1.2 Jim Walker Jeff Sposetti © Hortonworks Inc. 2013 Page 1
  • 2. Hortonworks Snapshot We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Develop Distribute Support • We employ the core • We distribute the only 100% • We are uniquely positioned architects, builders and Open Source Enterprise to deliver the highest quality operators of Apache Hadoop Hadoop Distribution: of Hadoop support Hortonworks Data • We drive innovation within Platform • We enable the ecosystem to Apache Software work better with Hadoop Foundation projects • We engineer, test & certify HDP for enterprise usage Endorsed by Strategic Partners Page 2 © Hortonworks Inc. 2013
  • 3. Hortonworks Process for Enterprise Hadoop Upstream Community Projects Downstream Enterprise Product Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream Integrate & Test Fixed Issues Apache Design & Pig Test & Patch Develop Apache Release Package Hadoop & Certify Apache Stable Project Hortonworks Hive Releases Design & Develop Data Platform Apache Apache HCatalo HBase g Distribute Apache Other Ambari Apache Projects No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Page 3 © Hortonworks Inc. 2013
  • 4. Hortonworks Data Platform 1.2 • Quarterly cadence – HDP is aligned tightly with the open source community software releases, not a patchwork – Regular open source innovation based on an open community • Ecosystem validation – Packaged and tested with our key development partner, Yahoo! across hundreds of nodes – Ambari is the preferred management tool for integration with of Microsoft System Center and Teradata Viewpoint, today. Page 4 © Hortonworks Inc. 2013
  • 5. HDP 1.2 Summary Hortonworks Data Platform 1.2 Hortonworks Data Platform outpaces the competition to extend leadership through 100% open source Enterprise Apache Hadoop Focus areas: 1. Ambari: continued innovation with a complete, free and open cluster management tool • Existing: Provision, Manage and Monitor your Hadoop infrastructure • New: Root Cause Analysis with job diagnostics, usage heat maps, • Improved: Ecosystem integration and user interface 2. Enhanced security model and performance for Hive and HCatalog 3. Apache Mahout: now included in the HDP distribution Page 5 © Hortonworks Inc. 2013
  • 6. HDP Certifies Latest Stable Components Apache HDP CDH CDH Project 1.2 3u5 4.1.2 Hadoop 1.1.2 020.2 +923.418 2.0.0alpha +541 Pig 0.10.1 0.8.1 +51.39 0.10.0 +48 Hive 0.10.0 0.7.1 +42.56 0.9.0 +148 HCatalog 0.5.0 n/a n/a HBase 0.94.2 0.90.6 +84.73 0.92.1 +154 Sqoop 1.4.2 1.3.0 +5.88 1.4.1 +51 Oozie 3.2.0 3.2.0 3.2.0 Zookeeper 3.4.5 3.3.5 +19.5 3.4.3 +25 Ambari 1.2.0 n/a n/a Flume 1.3.0 0.9.4 +25.46 1.2.0 +119 Mahout 0.7.0 0.5 +9.7 0.7 +4 Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf Page 6 © Hortonworks Inc. 2013
  • 7. A Brief History of Apache Hadoop Apache Project Yahoo! begins to Hortonworks Established Operate at scale Data Platform 2013 2004 2006 2008 2010 2012 Enterprise Hadoop 2005: Yahoo! creates team under E14 to Focus on INNOVATION work on Hadoop 2008: Yahoo team extends focus to operations to support multiple Focus on OPERATIONS projects & growing clusters 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 STABILITY key Hadoop engineers from Yahoo Page 7 © Hortonworks Inc. 2013
  • 8. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & Store, Operate at Process and Enterprise Hadoop Scale Access Data • The ONLY 100% open source HADOOP CORE Distributed and complete distribution Storage & Processing PLATFORM SERVICES Enterprise Readiness • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability Page 8 © Hortonworks Inc. 2013
  • 9. Next-Generation Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS HORTONWORKS MANAGE & DATA PLATFORM MONITOR RDBMS EDW MPP TRADITIONAL REPOS DATA SOURCES Traditional Sources New Sources OLTP, (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media) MOBILE POS DATA SYSTEMS Page 9 © Hortonworks Inc. 2013
  • 10. HDP 1.2: Operational Services Improvements OPERATIONAL DATA Apache Ambari 1.2 SERVICES SERVICES Hortonworks open source approach continues to accelerate Manage & AMBARI Store, Operate at Process and enterprise adoption of Hadoop Scale Access Data OOZIE – Open Source Approach The only 100% open source Apache Distributed Hadoop cluster management tool HADOOP CORE Storage & Processing – Baseline Features Enterprise Readiness Delivers all necessary tools/functions PLATFORM SERVICES High Availability, Disaster Recovery, to provision, manage and monitor a Snapshots, Security, etc… Apache Hadoop cluster HORTONWORKS – Innovation Provides ability to zoom into cluster DATA PLATFORM (HDP) usage and performance metrics for jobs and tasks to identify root cause of bottlenecks or operations issues – Interoperable Includes APIs for integrating with Microsoft System Center, Teradata Viewpoint, and other systems © Hortonworks Inc. 2013 Also Upgraded Oozie & Zookeeper 10 Page
  • 11. HDP 1.2: New Ambari Features • Job Diagnostics Visualize and troubleshoot Hadoop job execution and performance • Cluster History View historical job execution & performance • REST interface provides external access to Ambari for existing tools. Facilitates integration with Microsoft System Center and Teradata Viewpoint • Instant Insight View health of Core Hadoop (HDFS, MapReduce) and related projects • Cluster Navigation Apache Ambari Dashboard “Quick link” buttons jump into namenode web UI for a server Page 11 © Hortonworks Inc. 2013
  • 12. Demo Page 12 © Hortonworks Inc. 2013
  • 13. HDP 1.2: Platform Service Improvements OPERATIONAL DATA Security SERVICES SERVICES Extend platform services for security, a KEY requirement for Manage & Store, Operate at Process and enterprise adoption of Hadoop Scale Access Data – Enhanced security architecture & pluggable authentication model controls access to Hive tables and Distributed HADOOP CORE Storage & Processing metastore – Aligns and improves Hive & HCatalog Enterprise Readiness PLATFORM SERVICES High Availability, Disaster Recovery, authentication models Snapshots, Security, etc… HORTONWORKS High Availability DATA PLATFORM (HDP) Full stack HA on Hadoop 1.0 – Extended HA to Hive & HCatalog Metastore Page 13 © Hortonworks Inc. 2013
  • 14. HDP 1.2: Data Services Improvements Data Services Updates OPERATIONAL DATA SERVICES SERVICES – Upgraded Pig, and Flume FLUME PIG HIVE – Added Mahout (0.7.0) to distribution Manage & Store, Operate at MAHOUT Process and HBASE Scale SQOOP Access Data HCATALOG Hive, HCatalog & HBase Continue to innovate & improve the data Distributed services with open source contributions HADOOP CORE Storage & Processing to HCatalog, Hive and HBase Enterprise Readiness – Concurrency improvements for Hive PLATFORM SERVICES High Availability, Disaster Recovery, and consistent security for Hive & Snapshots, Security, etc… HCatalog HORTONWORKS – Performance and operational enhancements for HBase DATA PLATFORM (HDP) – Improved Java developer productivity via certified Cascading framework Page 14 © Hortonworks Inc. 2013
  • 17. Apache Community Leadership Apache Apache Software Foundation Pig Test & Guiding Principles Patch Release Apache • Release early & often Hadoop Apache • Transparency, respect, meritocracy Hive Design & Develop Apache Key Roles held by Hortonworkers Apache HBase HCatalo g • PMC Members – Managing community projects Apache Ambari – Mentoring new incubator projects Other Apache – About 20 Hortonworkers managing community Projects • Committers – Authoring, reviewing & editing code – About 50 Hortonworkers across projects “We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These • Release Managers include YARN, Ambari and HCatalog..” – Testing & releasing projects – Hortonworkers across key projects like Hadoop, - Jeff Kelly: Wikibon Hive, Pig, HCatalog, Ambari, HBase Page 17 © Hortonworks Inc. 2013
  • 18. True Enterprise Class Open Source • 100% Open Source. No Holdbacks. – Only true implementation of OSS Apache Hadoop – Preferred by the software vendors that you rely on • Flexible Deployment – No License Fee for usage • Community Open Source Mitigates Lock-In – Proprietary Open Source = Lock-In – Open communities always trump “open source” Page 18 © Hortonworks Inc. 2013
  • 19. THANK YOU!! Download Hortonworks Sandbox www.hortonworks.com/sandbox Download Hortonworks Data Platform www.hortonworks.com/download Register for Enterprise Hadoop Series www.hortonworks.com/webinars @hortonworks Follow US! @jaymce @jsposetti Page 19 © Hortonworks Inc. 2013

Notes de l'éditeur

  1. Committed to building 100% open source Hadoop for the Enterprise
  2. So how does this get brought together into our distribution? It is really pretty straightforward, but also very unique:We start with this group of open source projects that I described and that we are continually driving in the OSS community. [CLICK] We then package the appropriate versions of those open source projects, integrate and test them using a full suite, including all the IP for regression testing contributed by Yahoo, and [CLICK] contribute back all of the bug fixes to the open source tree. From there, we package and certify a distribution in the from of the Hortonworks Data Platform (HDP) that includes both Hadoop Core as well as the related projects required by the Enterprise user, and provide to our customers.Through this application of Enterprise Software development process to the open source projects, the result is a 100% open source distribution that has been packaged, tested and certified by Hortonworks. It is also 100% in sync with the open source trees.
  3. 100% Open Source: eliminating Lock-In
  4. Quarterly Cadence: regular innovation every three monthsValidated & Tested by our ecosystem partnersEmbargo Date: January 15
  5. HDP tracks closely to Apache project releasesCDH forks early and patches CDH distributions off to the side of the Apache community projects resulting in unnecessary drift and risk of lock-inThe “+923.423” and the “+541” parts of the version numbers represent how many patches these components have drifted away from corresponding Apache projects.While some drift can be expected, patches and changes that are in the order of hundreds results in lock-in and actually eliminates the virtuous cycle that upstream community should help drive.
  6. I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop.What we now know of as Hadoop really started back in 2005, when Eric Baldeschwieler – known as “E14” – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application.By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications.In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop.[note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
  7. In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  8. As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
  9. Eric and team created the Hadoop project as open source, and that is and always will be central to our approach. We believe strongly that the technology needs to be community driven and open source.In terms of open source mechanics, Apache Hadoop is governed by the Apache Software Foundation which provides structure to what inside a commercial software company would be a tightly governed process around the development, test and release process. When we think of Core Hadoop, the ASF has helped to manage this process for several years now.However as Hadoop has become more widely used, it has spawned a set of ancillary open source projects that introduce capabilities required for more mainstream use. These projects are generally classified as either being related to:“Data Services” – those that enable the Storage, Processing, and Accessing of data“Operational Services” – those that enable the management and operations of the infrastructureThe projects within these categories are run as independent projects with their own teams, and include some of the technologies you likely know of: Data Services include projects such as Hive, Pig, Hbase and Hcatalog, while Operational Services include Apache Ambari and more.Hortonworkers have always played a critical role in the development, test and release process for Core Apache Hadoop but also play leading roles in these ancillary projects that are required for enterprise usage. This includes every role from committer, release manager, and in many cases, the project leads. For example Arun Murthy is the project lead for Core Hadoop.Current Hortonworks PMC members by project:Hadoop:  Arun Murthy, Deveraj Das, EnisSoztutar, GiridharanKesavan, JitendraNathPandy, MahadevKonar, Matt Foley, Owen O'Malley, Sanjay Radia, Suresh Srinivas, Nicholas Sze, Vinod Kumar VavilapalliPig:  Daniel Dai, Alan Gates, GiridharanKesavan, AshutoshChauhan, Thejas NairHive:  AshutoshChauhanHBase:  NoneOozie:  Deveraj Das, Alan GatesSqoop:  NoneFlume:  NoneBigtop:  Alan Gates, Steve Loughran, Owen O'MalleyIncubator (not a Hadoop project but shows who's helping grow new projects in Apache):  Arun Murthy, Deveraj Das, Alan Gates, MahadevKonar, Steve Loughran, Owen O'Malley, EnisSoztutar
  10. We are believers in open source: for us, we believe it is the most efficient way to develop enterprise softwareBut more importantly, we believe that 100% open source is the best approach for our customers. And in particular in the data management market, our customers are acutely aware of the implication of growing their database usage with a proprietary vendor who then can exert pricing pressure (Oracle).Particularly when it comes to data storage, which we can all anticipate will continue to grow exponentially, you don’t want to be penalized for scale. By choosing an open source approach organizations can build their operational processes on open technologies, without concern that they will be locked in to a particular vendor. And they can be confident that as their usage grows, they can choose from flexible pricing alternatives – by node or by storage – that aligns best to their needs.It is ultimately about mitigating risk, and in this regard open source has been proven as the safest approach. I would also caution you to look beyond the open source label used by some vendors: are they harvesting open source work, forking the code and then working independently (“fork early / patch often”)? Or like Hortonworks, have they embraced and committed to the community open source approach which will allow them to stay in sync with the innovation of the community? In the Hadoop community, Hortonworks is unquestioned in taking the community-driven approach.