SlideShare a Scribd company logo
1 of 20
Creating a Secure Hadoop
                 Initiative
                 Securing the Big Data Ecosystem




This document contains confidential, proprietary and trade secret information and is subject to certain legal protection. You may not
review, copy, or distribute this information unless you are a designated recipient, and have prior written authorization from Zettaset, Inc.
About Me
• CTO Zettaset, Inc. – Big Data Hadoop Company
   – Founded 2007
• Distributed Computing Guy
   – Have been since college
• Security Guy
   – Founder SPI Dynamics (sold to HP, 2007)
   – Internet Security Systems, Prof. Services
   – Security First Network Bank, Sec. Guru.
Zettaset Enables Enterprise-Ready Hadoop
• Zettaset Orchestrator™ automates
  Hadoop installation and cluster
  management with an enterprise-ready
  solution for Big Data deployments
 – Enterprise-class – Hardened for security, high
   availability, and performance
 – Dramatically lowers operational expenses –
   Reduces IT resource requirements
 – Simple to deploy – Accelerates time to value
   from weeks to hours
 – Eliminates unnecessary dependencies on
   professional services
 – Works with any Apache Hadoop distribution



         3                  © 2012 Zettaset, Inc. | Proprietary and Confidential
Zettaset Orchestrator:
Making Hadoop Clusters Enterprise-Ready




     4
             © 2012 Zettaset, Inc. | Proprietary and Confidential
What is Big Data?

• Great Question
• It’s not a number, people define it
  differently.
• Majority define it as a scalability
  issue:
  – “The inability to continue storing and processing data the way
    that you’ve been storing and processing data.”
Exponential Data Growth = Big Data
                      Estimated Global Data Volume:
                          2011: 1.8 Zettabytes
                          2015: 7.9 Zettabytes
                      The world's information
                      doubles every two years
                      Over the next 10 years:
                       The number of servers worldwide
                        will grow by 10x
                       Amount of information managed
                        by enterprise data centers will
                        grow by 50x
                       Number of “files” enterprise data
                        center handle will grow by 75x

                       Source: http://www.emc.com/leadership/programs/digital-
                       universe.htm, which was based on the 2011 IDC Digital Universe
                       Study


6
Hadoop Distribution Landscape
         Distribution & Core                                        Apache                        BigInsights BigInsights
                                    CDH3u4    CDH4u0   HDP v1.0               MapR M3   MapR M5
            Components                                            Bigtop v0.3                       1.4 BE      1.4 EE
           Apache Hadoop                                                                                   
                 HDFS                                                                                                Open Source
               Fuse-DFS                                                     -          -            -          -        Apache Hadoop
             MapReduce                                                                                     
            MapReduce 2               -                  -           -         -          -            -          -
                                                                                                                             Proprietary
          Hadoop Common                                                     -          -                     
             Apache Hive                                                                                   
              Apache Pig                                                                                   
            Apache HBase                                                                                   
          Apache Zookeeper                                                  -          -                     
           Apache Ambari              -         -                    -         -          -            -          -
          Apache Templeton            -         -                    -         -          -            -          -
            Apache Flume                                -                                                   
            Apache Sqoop                                                                          -          -
           Apache Mahout                                -                                          -          -
            Apache Whirr                                -                                          -          -
            Apache Oozie                                                                                   
           Apache Lucene              -         -         -           -         -          -                     
            Apache Derby              -         -         -           -         -          -                     
             Apache Avro              -         -         -           -         -          -                     
                 Hue                                    -           -         -          -            -          -
           BigInsights Apps           -         -         -           -         -          -            -         
        Hadoop Management
                 Nagios               -         -                    -         -          -            -          -
                 Ganglia              -         -                    -         -         -             -          -
        Zettaset Orchestrator™                                                                             
           Cloudera Manager                             -           -         -          -            -          -
            MapR Manager              -         -         -           -                               -         -
        BigInsights web console       -         -         -           -         -          -            -         
       BigInsights simple console     -         -         -           -         -          -                     -
                                                                                                    7
January 24, 2013                             Zettaset, Inc. | Proprietary
What is the current state of Security?
•   Another Great Question
•   Minimal work has been done in this field
•   Currently Not a Huge Community Focus.
•   Everyone feels like it’s been addressed by
    adding Kerberos to the system

Don’t tell InfoSec People the Kerberos has fixed
                   everything!
Why Not Tell Them That?
• You will give them an aneurysm.
• Kerberos is “Brushed On” Security NOT
  “Baked In” security.
• Kerberos does NOT address compliance
  issues around data (HIPAA, GLBA, PCI, Etc.)

 Nothing around encryption, nothing around
              best practices.
Hadoop: What’s Missing?

• All Hadoop distros are constrained
  by the limitations of the Apache
  open source components
• Not written to support hardened
  security, compliance, encryption, po
  licy-enablement, and risk
  management
• Not written with high
  availability, service
  management, and monitoring in
  mind 10            © 2012 Zettaset, Inc. | Proprietary and Confidential
Current State of Hadoop Security

• Existing security for Apache-based Hadoop
  distributions does not meet enterprise requirements
  to support regulatory compliance mandates such as
  HIPAA and SOX, for example
• Security breaches can result in negative
  impact, e.g., release sensitive information, damage
  brand, compromise competitive advantage, spark
  litigation, etc.
• Hadoop security mechanism provides mutual
  authentication of users and services via SASL and
  Kerberos, but this has limitations




          11                 © 2012 Zettaset, Inc. | Proprietary and Confidential
Enterprise-Class Hadoop Security
      Addresses the security gaps and vulnerabilities that exist
             in all Apache-based Hadoop distributions
•   Hardened to address access control, policy, compliance and risk management

•   Support for Lightweight Directory Access Protocol (LDAP) and Active Directory
    (AD), enabling Hadoop clusters to seamlessly integrate with existing security policies
    within the enterprise environment

•   Centralized configuration management, logging, and auditing, which maintains control
    of ingress and egress points in the cluster, and enables Hadoop clusters to meet
    compliance requirements for reporting and forensics

•   Role-based access control (RBAC), which significantly improves the user
    authentication process, and enables Kerberos to be run against all components of a
    big data ecosystem, not just Hadoop




          12                  © 2012 Zettaset, Inc. | Proprietary and Confidential
Defining Big Data Use Case
• What is your use case?
• What are you trying to accomplish?
• What data are you going to be storing?
• What are you going to do after you store it?


  This will define your Security Threat Model and how you
                      protect your data.
Big Data Production System




                                                                  Log files
                                                                   Alerts
                                                                Transactions
                                                                    etc.



                   Structured             Semi-structured   Unstructured
  Types
  Data




                      Data                     Data             Data


January 24, 2013         Zettaset, Inc. | Proprietary and              14
Big Data Landscape (Version 2.0)
            Infrastructure                                   Analytics                                     Application
 NoSQL Databases         Hadoop Related        Analytics Solutions   Data Visualization                          s
                                                                                                          Ad Optimization




NewSQL Databases                                                                                Publisher            Marketing
                                                   Statistical                                    Tools
                                                   Computing           Social Media

  MPP        Management        Cluster
                                                                                                        Industry Applications
Databases    / Monitoring      Services
                                               Sentiment Analysis    Analytics Services
                                Security
                                                                                                 Application Service Providers
                                               Location / People /
                                                                       Big Data Search
                                                     Events
             Crowdsourcin                                               IT Analytics                          Data
 Storage          g
                                                                                                  Data       Sources Sources
                                                                                                                  Data
                              Collection /      Real-Time    Crowdsourc       SMB              Marketplace
                               Transport                     ed Analytics   Analytics              s



                               Cross Infrastructure /                                                      Personal Data
                                     Analytics       Open Source Projects
Framework    Query /             Data Access              Coordination /      Real -      Statistical     Machine        Cloud
            Data Flow                                       Workflow          Time          Tools         Learning     Deployme
                                                                                                                           nt
                   © Matt Turck (@mattturck) and Shivon Zilis (@shivonz) Bloomberg Ventures
What is a Threat Model?
• Threat modeling is based on the notion that any
  system or organization has assets of value worth
  protecting and these assets have certain
  vulnerabilities.
• Internal or external threats exploit these
  vulnerabilities in order to cause damage to the
  assets, and appropriate security
  countermeasures exist that mitigate the threats.
• A threat model can help to assess the
  probability, the potential harm, the priority etc., of
  attacks, and thus help to minimize or eradicate
  the threats.
Approaches to threat modeling
•   3 general approaches to threat modeling:

• Attacker-centric
     – Attacker-centric threat modeling starts with an attacker, and evaluates their
       goals, and how they might achieve them. Attacker's motivations are often
       considered, for example, "The NSA wants to read this email," or "Jon wants to
       copy this DVD and share it with his friends." This approach usually starts from
       either entry points or assets.
• Software-centric
     – Software-centric threat modeling (also called 'system-centric,' 'design-centric,' or
       'architecture-centric') starts from the design of the system, and attempts to step
       through a model of the system, looking for types of attacks against each element
       of the model. This approach is used in threat modeling in Microsoft's Security
       Development Lifecycle.
• Asset-centric
     – Asset-centric threat modeling involves starting from assets entrusted to a
       system, such as a collection of sensitive personal information.
Bottom Line
• Identify any threats to the confidentiality, availability and
  integrity of the data and the application based on the
  data access control matrix that your application should
  be enforcing
• Assign risk values and determine the risk responses
• Determine the countermeasures to implement based on
  your chosen risk responses
• Continually update the threat model based on the
  emerging security landscape.
Summary
 • All existing Apache-based Hadoop distributions have functional
   limitations which constrain enterprise adoption
 • Zettaset Orchestrator is addressing the enterprise-level gaps in
   security, high availability, performance, and manageability that
   exist in all Apache-based Hadoop distributions
 • Orchestrator is a universal management and control software
   layer that can sit on top of any Hadoop distribution (distro-
   agnostic)
 • Orchestrator fills the Service Management gaps that exist in all
   Hadoop distributions and cluster deployments, and makes
   Hadoop ready for broader enterprise adoption



       19              © 2012 Zettaset, Inc. | Proprietary and Confidential
Thank You !

More Related Content

What's hot

Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
H cat berlinbuzzwords2012
H cat berlinbuzzwords2012H cat berlinbuzzwords2012
H cat berlinbuzzwords2012Hortonworks
 
Ecossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaEcossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaNelson Forte
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationYahoo Developer Network
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Hortonworks
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performanceDataWorks Summit
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsAnandMHadoop
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtechYuta Imai
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInAllen Wittenauer
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 

What's hot (20)

Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
H cat berlinbuzzwords2012
H cat berlinbuzzwords2012H cat berlinbuzzwords2012
H cat berlinbuzzwords2012
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Ecossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaEcossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine Luiza
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
Hadoop
HadoopHadoop
Hadoop
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 

Viewers also liked

Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHortonworks
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBoni Bruno
 
Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabasePivotalOpenSourceHub
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big DataNicolas Morales
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 

Viewers also liked (8)

Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-Validation
 
Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum Database
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 

Similar to Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]Shweta Patnaik
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Mich Talebzadeh (Ph.D.)
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSandish Kumar H N
 
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native ServicesAccumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native ServicesAccumulo Summit
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceSteve Loughran
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache PigSachin Vakkund
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
The Big Picture on Hadoop
The Big Picture on HadoopThe Big Picture on Hadoop
The Big Picture on HadoopStackIQ
 
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...Yahoo Developer Network
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Puppet
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
 

Similar to Big Data Cloud Meetup - Jan 24 2013 - Zettaset (20)

Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native ServicesAccumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
The Big Picture on Hadoop
The Big Picture on HadoopThe Big Picture on Hadoop
The Big Picture on Hadoop
 
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 

More from BigDataCloud

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsBigDataCloud
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing ServicesBigDataCloud
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!BigDataCloud
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBigDataCloud
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBigDataCloud
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud PlatformBigDataCloud
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value BigDataCloud
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.BigDataCloud
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideBigDataCloud
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?BigDataCloud
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalBigDataCloud
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinBigDataCloud
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...BigDataCloud
 

More from BigDataCloud (20)

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & Apps
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud Platform
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud Platform
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Big Data Cloud Meetup - Jan 24 2013 - Zettaset

  • 1. Creating a Secure Hadoop Initiative Securing the Big Data Ecosystem This document contains confidential, proprietary and trade secret information and is subject to certain legal protection. You may not review, copy, or distribute this information unless you are a designated recipient, and have prior written authorization from Zettaset, Inc.
  • 2. About Me • CTO Zettaset, Inc. – Big Data Hadoop Company – Founded 2007 • Distributed Computing Guy – Have been since college • Security Guy – Founder SPI Dynamics (sold to HP, 2007) – Internet Security Systems, Prof. Services – Security First Network Bank, Sec. Guru.
  • 3. Zettaset Enables Enterprise-Ready Hadoop • Zettaset Orchestrator™ automates Hadoop installation and cluster management with an enterprise-ready solution for Big Data deployments – Enterprise-class – Hardened for security, high availability, and performance – Dramatically lowers operational expenses – Reduces IT resource requirements – Simple to deploy – Accelerates time to value from weeks to hours – Eliminates unnecessary dependencies on professional services – Works with any Apache Hadoop distribution 3 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • 4. Zettaset Orchestrator: Making Hadoop Clusters Enterprise-Ready 4 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • 5. What is Big Data? • Great Question • It’s not a number, people define it differently. • Majority define it as a scalability issue: – “The inability to continue storing and processing data the way that you’ve been storing and processing data.”
  • 6. Exponential Data Growth = Big Data Estimated Global Data Volume:  2011: 1.8 Zettabytes  2015: 7.9 Zettabytes The world's information doubles every two years Over the next 10 years:  The number of servers worldwide will grow by 10x  Amount of information managed by enterprise data centers will grow by 50x  Number of “files” enterprise data center handle will grow by 75x Source: http://www.emc.com/leadership/programs/digital- universe.htm, which was based on the 2011 IDC Digital Universe Study 6
  • 7. Hadoop Distribution Landscape Distribution & Core Apache BigInsights BigInsights CDH3u4 CDH4u0 HDP v1.0 MapR M3 MapR M5 Components Bigtop v0.3 1.4 BE 1.4 EE Apache Hadoop         HDFS         Open Source Fuse-DFS     - - - - Apache Hadoop MapReduce         MapReduce 2 -  - - - - - - Proprietary Hadoop Common     - -   Apache Hive         Apache Pig         Apache HBase         Apache Zookeeper     - -   Apache Ambari - -  - - - - - Apache Templeton - -  - - - - - Apache Flume   -      Apache Sqoop       - - Apache Mahout   -    - - Apache Whirr   -    - - Apache Oozie         Apache Lucene - - - - - -   Apache Derby - - - - - -   Apache Avro - - - - - -   Hue   - - - - - - BigInsights Apps - - - - - - -  Hadoop Management Nagios - -  - - - - - Ganglia - -  - - - - - Zettaset Orchestrator™         Cloudera Manager   - - - - - - MapR Manager - - - -   - - BigInsights web console - - - - - - -  BigInsights simple console - - - - - -  - 7 January 24, 2013 Zettaset, Inc. | Proprietary
  • 8. What is the current state of Security? • Another Great Question • Minimal work has been done in this field • Currently Not a Huge Community Focus. • Everyone feels like it’s been addressed by adding Kerberos to the system Don’t tell InfoSec People the Kerberos has fixed everything!
  • 9. Why Not Tell Them That? • You will give them an aneurysm. • Kerberos is “Brushed On” Security NOT “Baked In” security. • Kerberos does NOT address compliance issues around data (HIPAA, GLBA, PCI, Etc.) Nothing around encryption, nothing around best practices.
  • 10. Hadoop: What’s Missing? • All Hadoop distros are constrained by the limitations of the Apache open source components • Not written to support hardened security, compliance, encryption, po licy-enablement, and risk management • Not written with high availability, service management, and monitoring in mind 10 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • 11. Current State of Hadoop Security • Existing security for Apache-based Hadoop distributions does not meet enterprise requirements to support regulatory compliance mandates such as HIPAA and SOX, for example • Security breaches can result in negative impact, e.g., release sensitive information, damage brand, compromise competitive advantage, spark litigation, etc. • Hadoop security mechanism provides mutual authentication of users and services via SASL and Kerberos, but this has limitations 11 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • 12. Enterprise-Class Hadoop Security Addresses the security gaps and vulnerabilities that exist in all Apache-based Hadoop distributions • Hardened to address access control, policy, compliance and risk management • Support for Lightweight Directory Access Protocol (LDAP) and Active Directory (AD), enabling Hadoop clusters to seamlessly integrate with existing security policies within the enterprise environment • Centralized configuration management, logging, and auditing, which maintains control of ingress and egress points in the cluster, and enables Hadoop clusters to meet compliance requirements for reporting and forensics • Role-based access control (RBAC), which significantly improves the user authentication process, and enables Kerberos to be run against all components of a big data ecosystem, not just Hadoop 12 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • 13. Defining Big Data Use Case • What is your use case? • What are you trying to accomplish? • What data are you going to be storing? • What are you going to do after you store it? This will define your Security Threat Model and how you protect your data.
  • 14. Big Data Production System Log files Alerts Transactions etc. Structured Semi-structured Unstructured Types Data Data Data Data January 24, 2013 Zettaset, Inc. | Proprietary and 14
  • 15. Big Data Landscape (Version 2.0) Infrastructure Analytics Application NoSQL Databases Hadoop Related Analytics Solutions Data Visualization s Ad Optimization NewSQL Databases Publisher Marketing Statistical Tools Computing Social Media MPP Management Cluster Industry Applications Databases / Monitoring Services Sentiment Analysis Analytics Services Security Application Service Providers Location / People / Big Data Search Events Crowdsourcin IT Analytics Data Storage g Data Sources Sources Data Collection / Real-Time Crowdsourc SMB Marketplace Transport ed Analytics Analytics s Cross Infrastructure / Personal Data Analytics Open Source Projects Framework Query / Data Access Coordination / Real - Statistical Machine Cloud Data Flow Workflow Time Tools Learning Deployme nt © Matt Turck (@mattturck) and Shivon Zilis (@shivonz) Bloomberg Ventures
  • 16. What is a Threat Model? • Threat modeling is based on the notion that any system or organization has assets of value worth protecting and these assets have certain vulnerabilities. • Internal or external threats exploit these vulnerabilities in order to cause damage to the assets, and appropriate security countermeasures exist that mitigate the threats. • A threat model can help to assess the probability, the potential harm, the priority etc., of attacks, and thus help to minimize or eradicate the threats.
  • 17. Approaches to threat modeling • 3 general approaches to threat modeling: • Attacker-centric – Attacker-centric threat modeling starts with an attacker, and evaluates their goals, and how they might achieve them. Attacker's motivations are often considered, for example, "The NSA wants to read this email," or "Jon wants to copy this DVD and share it with his friends." This approach usually starts from either entry points or assets. • Software-centric – Software-centric threat modeling (also called 'system-centric,' 'design-centric,' or 'architecture-centric') starts from the design of the system, and attempts to step through a model of the system, looking for types of attacks against each element of the model. This approach is used in threat modeling in Microsoft's Security Development Lifecycle. • Asset-centric – Asset-centric threat modeling involves starting from assets entrusted to a system, such as a collection of sensitive personal information.
  • 18. Bottom Line • Identify any threats to the confidentiality, availability and integrity of the data and the application based on the data access control matrix that your application should be enforcing • Assign risk values and determine the risk responses • Determine the countermeasures to implement based on your chosen risk responses • Continually update the threat model based on the emerging security landscape.
  • 19. Summary • All existing Apache-based Hadoop distributions have functional limitations which constrain enterprise adoption • Zettaset Orchestrator is addressing the enterprise-level gaps in security, high availability, performance, and manageability that exist in all Apache-based Hadoop distributions • Orchestrator is a universal management and control software layer that can sit on top of any Hadoop distribution (distro- agnostic) • Orchestrator fills the Service Management gaps that exist in all Hadoop distributions and cluster deployments, and makes Hadoop ready for broader enterprise adoption 19 © 2012 Zettaset, Inc. | Proprietary and Confidential

Editor's Notes

  1. What does Zettaset do?
  2. Data capacity on average in enterprises is growing at 40% to 60% year over year due to a number of factors, including an explosion in unstructured data.” - Computerworld, 2010“80 percent of data is unstructured.” - IBM, 2010
  3. IBM BigInsights includes proprietary applications as part of its EE distribution only. These include Jaql, Jaqlserver, Workflow, BigSheets, and LanguageWare
  4. Don’t tell them it’s secure just because it’s behind a fire wall either… it didn’t work in 1995 and it dosent always work now.
  5. Before you can talk about securing big data, you have to understand your data use case