SlideShare a Scribd company logo
1 of 15
© Hortonworks Inc. 2014
Structor – Automated Building
of Virtual Hadoop Clusters
July 2014
Page 1
Owen O’Malley
owen@hortonworks.com
@owen_omalley
© Hortonworks Inc. 2014
Page 2
•Creating a virtual Hadoop cluster is hard
–Takes time to set and configure VM
–A “learning experience” for new engineers
–Each engineer has a different setup
–Experimenting is hazardous!
•Setting up security is even harder
–Most developers don’t test with security
•Need to test Ambari and manual installs
•Need to test various operating systems
What’s the Problem?
© Hortonworks Inc. 2014
Page 3
•Create scripts to create a working
Hadoop cluster
–Secure or Non-secure
–Multiple nodes
•Vagrant
–Used for creating and managing the VMs
–VM starts as a base box with no Hadoop
•Puppet
–Used for provisioning the Hadoop packages
Solution
© Hortonworks Inc. 2014
Page 4
•We’ve put everything for a development
box into the Vagrant base box (CentOS6)
–Build tools: ant, git, java, maven, protobuf, thrift
•Downloaded once and cached
•Setup
% vagrant init omalley/centos6_x64
% vagrant up
% vagrant ssh
•Less than a minute
Simplest Case – Development Box
© Hortonworks Inc. 2014
Page 5
•Ssh in with “vagrant ssh”
–Account: vagrant, Password: vagrant
–Become root with “sudo –i”
•Clone directory to make copies
•Other useful vagrant commands:
% vagrant status – list virtual machines
% vagrant suspend – suspend virtual machines
% vagrant resume – resume virtual machines
% vagrant destroy – destroy virtual machines
Using the Box
© Hortonworks Inc. 2014
Page 6
•Commands to start cluster
% git clone git@github.com:hortonworks/structor.git
% cd structor
% vagrant up
•Default profile has 3 machines
–gw – client gateway machine
–nn – master (NameNode, ResourceMgr)
–slave1 – slaves (DataNode, NodeManager)
•HDFS, Yarn, Hive, Pig, and Zookeeper
Setting up Non-Secure Cluster
© Hortonworks Inc. 2014
Page 7
•Add hostnames to /etc/hosts
240.0.0.10 gw.example.com
240.0.0.11 nn.example.com
240.0.0.12 slave1.example.com
240.0.0.13 slave2.example.com
240.0.0.14 slave3.example.com
•HDFS – http://nn.example.com:50070/
•Yarn – http://nn.example.com:8088/
•For security
–Modify /etc/krb5.conf as in README.md.
–Use Safari or Firefox (needs config change)
Setting up your Mac
© Hortonworks Inc. 2014
Page 8
•Commands to start cluster
% ln –s profiles/3node-secure.profile current.profile
% mkdir generated (bug workaround)
% vagrant up
•Brings up 3 machines with security
–Includes a kdc and principles
•Yarn Web UI - https://nn.exaple.com:8090
•“kinit vagrant” on your Mac for Web UI
•Ssh to gw and kinit for the CLI
Setting up Secure Cluster
© Hortonworks Inc. 2014
Page 9
•JSON files that control cluster
•3 node secure cluster:
{ "domain": "example.com”, "realm": "EXAMPLE.COM",
"security": true,
"vm_mem": 2048, "server_mem": 300, "client_mem": 200,
"clients" : [ "hdfs", "yarn", "pig", "hive", "zk" ],
"nodes": [
{ "hostname": "gw", "ip": "240.0.0.10", "roles": [ "client" ] },
{ "hostname": "nn", "ip": "240.0.0.11", "roles": [ "kdc", "nn", "yarn",
"hive-meta", "hive-db”, "zk" ]},
{ "hostname": "slave1", "ip": "240.0.0.12", "roles": [ "slave" ]}]}
Profiles
© Hortonworks Inc. 2014
Page 10
•Various profiles
–1node-nonsecure
–3node-secure
–5node-nonsecure
–ambari-nonsecure
–knox-nonsecure
•Great way to setup Ambari cluster
•Project owners should add their project
–Help other developers use your project
Additional Profiles
© Hortonworks Inc. 2014
Page 11
•The master branch is Hadoop 2.4
–There is also an Hadoop 1.1 (hdp-1.3) branch
•All packages are installed via Puppet
–Uses built in OS package tools
•Repo file is in files/repos/hdp.repo
–Can override source of packages
–Easy to change to download custom builds
Choosing HDP versions
© Hortonworks Inc. 2014
Page 12
•Each configuration file is templated
•HDFS configuration is in
–modules/hdfs_client/templates/*.erb
–Changes will apply to all nodes
•We use Ruby to find NameNode:
<% @namenode = eval(@nodes).select {|node|
node[:roles].include? 'nn’} [0][:hostname] + "." + @domain; %>
<property>
<name>fs.defaultFS</name>
<value>hdfs://<%= @namenode %>:8020</value>
</property>
Configuration Files (eg. core-site.xml)
© Hortonworks Inc. 2014
Page 13
•Actual work is done via Puppet
–Hides details of each OS
•Modularized
–Top level is manifests/default.pp
–Each module is in modules/*
•Top level looks like:
include selinux
include ntp
if $security == "true" and hasrole($roles, 'kdc') {
include kerberos_kdc
}
Puppet
© Hortonworks Inc. 2014
Page 14
•Add other Hadoop ecosystem tools
–Tez
–HBase
•Add other operating systems
–Ubuntu, Suse, CentOS 5
•Support other Vagrant providers
–Amazon EC2
–Docker
•Support for other backing RDBs
Future Directions
© Hortonworks Inc. 2013
Thank You!
Questions & Answers
Page 15

More Related Content

What's hot

Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesosnelsonadpresent
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 

What's hot (20)

Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 

Viewers also liked

Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to HiveOwen O'Malley
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopOwen O'Malley
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File IntroductionOwen O'Malley
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduceOwen O'Malley
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroOwen O'Malley
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013Owen O'Malley
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 

Viewers also liked (17)

Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to Hive
 
Data protection2015
Data protection2015Data protection2015
Data protection2015
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
ORC Files
ORC FilesORC Files
ORC Files
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
ORC 2015
ORC 2015ORC 2015
ORC 2015
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 

Similar to Structor - Automated Building of Virtual Hadoop Clusters

Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopfann wu
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for WindowsTerry Padgett
 
Extending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.jsExtending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.jsPetr Jiricka
 
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put togetherVMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put togetherEduardo Patrocinio
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Deploying to Ubuntu on Linode
Deploying to Ubuntu on LinodeDeploying to Ubuntu on Linode
Deploying to Ubuntu on LinodeWO Community
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMJeffrey Breen
 
Using puppet
Using puppetUsing puppet
Using puppetAlex Su
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Dockernklmish
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudSalman Baset
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityPhil Estes
 
Undine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsUndine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsDavid Watson
 
OpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and WindowsOpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and WindowsAlessandro Pilotti
 

Similar to Structor - Automated Building of Virtual Hadoop Clusters (20)

Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
Extending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.jsExtending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.js
 
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put togetherVMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
 
Core os dna_oscon
Core os dna_osconCore os dna_oscon
Core os dna_oscon
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Deploying to Ubuntu on Linode
Deploying to Ubuntu on LinodeDeploying to Ubuntu on Linode
Deploying to Ubuntu on Linode
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VM
 
Core os dna_automacon
Core os dna_automaconCore os dna_automacon
Core os dna_automacon
 
Using puppet
Using puppetUsing puppet
Using puppet
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker Security
 
Undine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsUndine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development Environments
 
OpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and WindowsOpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and Windows
 

More from Owen O'Malley

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemOwen O'Malley
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACIDOwen O'Malley
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryptionOwen O'Malley
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 IcebergOwen O'Malley
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column EncryptionOwen O'Malley
 

More from Owen O'Malley (9)

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 Iceberg
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
 

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Structor - Automated Building of Virtual Hadoop Clusters

  • 1. © Hortonworks Inc. 2014 Structor – Automated Building of Virtual Hadoop Clusters July 2014 Page 1 Owen O’Malley owen@hortonworks.com @owen_omalley
  • 2. © Hortonworks Inc. 2014 Page 2 •Creating a virtual Hadoop cluster is hard –Takes time to set and configure VM –A “learning experience” for new engineers –Each engineer has a different setup –Experimenting is hazardous! •Setting up security is even harder –Most developers don’t test with security •Need to test Ambari and manual installs •Need to test various operating systems What’s the Problem?
  • 3. © Hortonworks Inc. 2014 Page 3 •Create scripts to create a working Hadoop cluster –Secure or Non-secure –Multiple nodes •Vagrant –Used for creating and managing the VMs –VM starts as a base box with no Hadoop •Puppet –Used for provisioning the Hadoop packages Solution
  • 4. © Hortonworks Inc. 2014 Page 4 •We’ve put everything for a development box into the Vagrant base box (CentOS6) –Build tools: ant, git, java, maven, protobuf, thrift •Downloaded once and cached •Setup % vagrant init omalley/centos6_x64 % vagrant up % vagrant ssh •Less than a minute Simplest Case – Development Box
  • 5. © Hortonworks Inc. 2014 Page 5 •Ssh in with “vagrant ssh” –Account: vagrant, Password: vagrant –Become root with “sudo –i” •Clone directory to make copies •Other useful vagrant commands: % vagrant status – list virtual machines % vagrant suspend – suspend virtual machines % vagrant resume – resume virtual machines % vagrant destroy – destroy virtual machines Using the Box
  • 6. © Hortonworks Inc. 2014 Page 6 •Commands to start cluster % git clone git@github.com:hortonworks/structor.git % cd structor % vagrant up •Default profile has 3 machines –gw – client gateway machine –nn – master (NameNode, ResourceMgr) –slave1 – slaves (DataNode, NodeManager) •HDFS, Yarn, Hive, Pig, and Zookeeper Setting up Non-Secure Cluster
  • 7. © Hortonworks Inc. 2014 Page 7 •Add hostnames to /etc/hosts 240.0.0.10 gw.example.com 240.0.0.11 nn.example.com 240.0.0.12 slave1.example.com 240.0.0.13 slave2.example.com 240.0.0.14 slave3.example.com •HDFS – http://nn.example.com:50070/ •Yarn – http://nn.example.com:8088/ •For security –Modify /etc/krb5.conf as in README.md. –Use Safari or Firefox (needs config change) Setting up your Mac
  • 8. © Hortonworks Inc. 2014 Page 8 •Commands to start cluster % ln –s profiles/3node-secure.profile current.profile % mkdir generated (bug workaround) % vagrant up •Brings up 3 machines with security –Includes a kdc and principles •Yarn Web UI - https://nn.exaple.com:8090 •“kinit vagrant” on your Mac for Web UI •Ssh to gw and kinit for the CLI Setting up Secure Cluster
  • 9. © Hortonworks Inc. 2014 Page 9 •JSON files that control cluster •3 node secure cluster: { "domain": "example.com”, "realm": "EXAMPLE.COM", "security": true, "vm_mem": 2048, "server_mem": 300, "client_mem": 200, "clients" : [ "hdfs", "yarn", "pig", "hive", "zk" ], "nodes": [ { "hostname": "gw", "ip": "240.0.0.10", "roles": [ "client" ] }, { "hostname": "nn", "ip": "240.0.0.11", "roles": [ "kdc", "nn", "yarn", "hive-meta", "hive-db”, "zk" ]}, { "hostname": "slave1", "ip": "240.0.0.12", "roles": [ "slave" ]}]} Profiles
  • 10. © Hortonworks Inc. 2014 Page 10 •Various profiles –1node-nonsecure –3node-secure –5node-nonsecure –ambari-nonsecure –knox-nonsecure •Great way to setup Ambari cluster •Project owners should add their project –Help other developers use your project Additional Profiles
  • 11. © Hortonworks Inc. 2014 Page 11 •The master branch is Hadoop 2.4 –There is also an Hadoop 1.1 (hdp-1.3) branch •All packages are installed via Puppet –Uses built in OS package tools •Repo file is in files/repos/hdp.repo –Can override source of packages –Easy to change to download custom builds Choosing HDP versions
  • 12. © Hortonworks Inc. 2014 Page 12 •Each configuration file is templated •HDFS configuration is in –modules/hdfs_client/templates/*.erb –Changes will apply to all nodes •We use Ruby to find NameNode: <% @namenode = eval(@nodes).select {|node| node[:roles].include? 'nn’} [0][:hostname] + "." + @domain; %> <property> <name>fs.defaultFS</name> <value>hdfs://<%= @namenode %>:8020</value> </property> Configuration Files (eg. core-site.xml)
  • 13. © Hortonworks Inc. 2014 Page 13 •Actual work is done via Puppet –Hides details of each OS •Modularized –Top level is manifests/default.pp –Each module is in modules/* •Top level looks like: include selinux include ntp if $security == "true" and hasrole($roles, 'kdc') { include kerberos_kdc } Puppet
  • 14. © Hortonworks Inc. 2014 Page 14 •Add other Hadoop ecosystem tools –Tez –HBase •Add other operating systems –Ubuntu, Suse, CentOS 5 •Support other Vagrant providers –Amazon EC2 –Docker •Support for other backing RDBs Future Directions
  • 15. © Hortonworks Inc. 2013 Thank You! Questions & Answers Page 15