SlideShare une entreprise Scribd logo
1  sur  23
Cloudbreak – Technical Deep Dive
Janos Matyas & Krisztian Horvath
Hortonworks
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Presenters
Krisztian Horvath
Senior Member of technical staff, Cloudbreak
Former Co-Founder at SequenceIQ
Janos Matyas
Senior Director of Engineering, Cloudbreak
Former Co-Founder and CTO for SequenceIQ
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals and Motivations – What We Wanted to Do…
 Declarative/full Hadoop stack provisioning in all major cloud providers
 Automate and unify the process
 Zero-configuration approach
 Same process through a cluster lifecycle (Dev, QA, UAT, Prod)
 Provide tooling - UI, REST API and CLI/shell
 Secure and multi-tenant
 SLA policy based autoscaling
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals and Motivations – What We Wanted to Do…
 All cloud providers are fundamentally different…
 Compute, network, security, performance
 We want to share what we found, and how we made it work!
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Technology Stack
 Apache Ambari
 Cloud provider API
 Salt
 Docker
 Packer
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive - Overview
 Cloudbreak Deployer (CBD)
– Tool to deploy the Cloudbreak application
– Microservice architecture (using Docker)
– DevOps friendly
 Cloudbreak Application
– Extensible, available through UI, CLI, REST API
– SLA auto-scaling policy management
 Cluster deployed with Cloudbreak
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive – Cloudbreak Deployer
 Installation
– Single binary, written in Go
– Requires Docker 1.9.1+
– DIY installation on any RHEL / CentOS / Oracle Linux 7 (64-bit) distro
– Use one of the pre-built cloud images (AWS, Azure, GCP, OpenStack)
 Operations
– Easy upgrades/downgrades, automatic schema migration
 Cloud provider support
– AWS – generates IAM roles
– Azure – ARM and DASH config
 Utilities
– Cloudbreak shell support - interactive, remote, automated execution, OAuth2 token generation
– Local development environment setup
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive – Cloudbreak Application
 Installation
– Done with Cloudbreak Deployer (CBD)
 Operations
– Consistent feature set through UI, CLI and secure REST API
– Multi-tenant, ACL setup, usage reports
– Custom stack repositories, failure actions
– Event history, cluster management
– SLA based auto-scaling policy configs, enforcement
 Cloud provider support
– Agnostic API
– AWS, Azure, GCP, OpenStack, Mesos
– SPI interface – bring your own provider, stack under Cloudbreak management
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive – Cluster deployed with Cloudbreak
 Installation
– Managed by Cloudbreak using cloud provider API
– Default (optimized) configs – specific to cloud provider
 Operations
– Default, custom configs for stacks, services, network, storage, security
– Declarative Hadoop cluster
– Custom instance types (heterogeneous clusters)
– Different storage types
– Configurable network
– Security (access, Kerberos, SSSD, FreeIPA)
 Utilities
– Ambari Views
– Metadata/shared clusters support
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned
 Not all cloud providers are the same
– Difference in performance, storage and functionality
 (Capacity) planning
– Based on workload type (batch / interactive and ad-hoc / long running)
– Use heterogeneous clusters
– Trial and error – mistakes are cheap, iterate until you find your best fit
– Leverage the cloud - scale your cluster on demand
 Number one consideration – storage
– Multiple choices (ephemeral, block storage and BLOB store)
– Bring compute to storage – might not work (everywhere) – in cloud everything is as a service
– Independently scale storage from compute, partition your data
 Security
– Consider using strict security rules (private subnets, access, etc) and use edge nodes
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - AWS
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– Different instance types for transient (e.g. C4, M4) and long running (e.g. H2, D2) clusters
– Dedicated instances (to avoid noise, regulations e.g. HIPPA)
 Storage
– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)
– Note that S3 gives you only eventual consistency
– Different driver implementation: S3n (native, jets3t based), S3a (successor of n) , S3 (block based)
 Network
– Use enhanced networking (Amazon Linux by default, RHEL based – apply patch)
– Placement groups
– Not all instance types can use the 10Gbit network (e.g. use 8x)
 Security
– Use instance roles to access S3, deploy in a private subnet/VPC
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - AWS
* D28xlarge used as instance type
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - AWS
* D28xlarge used as instance type
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - Azure
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– Different instance types for transient (e.g. A and D family) and long running (e.g. Dv2) clusters
– Use ARM instead of old API
 Storage
– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)
– Storage account scaling limitations
– Use WASB or WASB with DASH (default with Cloudbreak)
– Azure Data Lake Store – soon
– Ephemeral disk is faster than root disk – does not survive auto-updates
 Network
– No PTR record/reverse lookup support
 Security
– Integrate/sync with your corporate AD
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - Azure
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - GCP
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– No template based provisioning
 Storage
– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)
– Use Google Cloud Storage Connector
 Network
– Network isolation/DNS problem
 Security
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - OpenStack
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– Use Heat templates instead of API calls (we support both)
 Storage
– Currently we support only Cinder volumes
– Swift and Ceph is planned
– Data locality through Cloudbreak – let us know your topology or rack/hypervisor mapping
 Network
– Configure DNS properly
– Use multiple network (Neutron) nodes in case of a large cluster
 Security
– Use Keystone 3 (support for OAuth, Federation, introduction of groups/domains)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - Mesos
 In Tech Preview
– come and talk to us after the talk
– Or @Hortonworks boot
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

Contenu connexe

Tendances

Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 

Tendances (20)

Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
ebay
ebayebay
ebay
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 

En vedette

Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
Janos Matyas
 
DockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New ServicesDockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New Services
Docker, Inc.
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
Hiba Hamdan
 

En vedette (20)

Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWS
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Apache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basicsApache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basics
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
DockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New ServicesDockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New Services
 
7+1 myths of the new os
7+1 myths of the new os7+1 myths of the new os
7+1 myths of the new os
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streams
 
Sql Stream Intro
Sql Stream IntroSql Stream Intro
Sql Stream Intro
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 

Similaire à Cloudbreak - Technical Deep Dive

Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
DataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 

Similaire à Cloudbreak - Technical Deep Dive (20)

Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
An overview of OpenStack for the VMware community
An overview of OpenStack for the VMware communityAn overview of OpenStack for the VMware community
An overview of OpenStack for the VMware community
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 

Plus de DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Cloudbreak - Technical Deep Dive

  • 1. Cloudbreak – Technical Deep Dive Janos Matyas & Krisztian Horvath Hortonworks
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Presenters Krisztian Horvath Senior Member of technical staff, Cloudbreak Former Co-Founder at SequenceIQ Janos Matyas Senior Director of Engineering, Cloudbreak Former Co-Founder and CTO for SequenceIQ
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goals and Motivations – What We Wanted to Do…  Declarative/full Hadoop stack provisioning in all major cloud providers  Automate and unify the process  Zero-configuration approach  Same process through a cluster lifecycle (Dev, QA, UAT, Prod)  Provide tooling - UI, REST API and CLI/shell  Secure and multi-tenant  SLA policy based autoscaling
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goals and Motivations – What We Wanted to Do…  All cloud providers are fundamentally different…  Compute, network, security, performance  We want to share what we found, and how we made it work!
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Technology Stack  Apache Ambari  Cloud provider API  Salt  Docker  Packer
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive - Overview  Cloudbreak Deployer (CBD) – Tool to deploy the Cloudbreak application – Microservice architecture (using Docker) – DevOps friendly  Cloudbreak Application – Extensible, available through UI, CLI, REST API – SLA auto-scaling policy management  Cluster deployed with Cloudbreak
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive – Cloudbreak Deployer  Installation – Single binary, written in Go – Requires Docker 1.9.1+ – DIY installation on any RHEL / CentOS / Oracle Linux 7 (64-bit) distro – Use one of the pre-built cloud images (AWS, Azure, GCP, OpenStack)  Operations – Easy upgrades/downgrades, automatic schema migration  Cloud provider support – AWS – generates IAM roles – Azure – ARM and DASH config  Utilities – Cloudbreak shell support - interactive, remote, automated execution, OAuth2 token generation – Local development environment setup
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive – Cloudbreak Application  Installation – Done with Cloudbreak Deployer (CBD)  Operations – Consistent feature set through UI, CLI and secure REST API – Multi-tenant, ACL setup, usage reports – Custom stack repositories, failure actions – Event history, cluster management – SLA based auto-scaling policy configs, enforcement  Cloud provider support – Agnostic API – AWS, Azure, GCP, OpenStack, Mesos – SPI interface – bring your own provider, stack under Cloudbreak management
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive – Cluster deployed with Cloudbreak  Installation – Managed by Cloudbreak using cloud provider API – Default (optimized) configs – specific to cloud provider  Operations – Default, custom configs for stacks, services, network, storage, security – Declarative Hadoop cluster – Custom instance types (heterogeneous clusters) – Different storage types – Configurable network – Security (access, Kerberos, SSSD, FreeIPA)  Utilities – Ambari Views – Metadata/shared clusters support
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned  Not all cloud providers are the same – Difference in performance, storage and functionality  (Capacity) planning – Based on workload type (batch / interactive and ad-hoc / long running) – Use heterogeneous clusters – Trial and error – mistakes are cheap, iterate until you find your best fit – Leverage the cloud - scale your cluster on demand  Number one consideration – storage – Multiple choices (ephemeral, block storage and BLOB store) – Bring compute to storage – might not work (everywhere) – in cloud everything is as a service – Independently scale storage from compute, partition your data  Security – Consider using strict security rules (private subnets, access, etc) and use edge nodes
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - AWS  Compute – Find your instance types for the workload, use heterogeneous clusters – Different instance types for transient (e.g. C4, M4) and long running (e.g. H2, D2) clusters – Dedicated instances (to avoid noise, regulations e.g. HIPPA)  Storage – Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations) – Note that S3 gives you only eventual consistency – Different driver implementation: S3n (native, jets3t based), S3a (successor of n) , S3 (block based)  Network – Use enhanced networking (Amazon Linux by default, RHEL based – apply patch) – Placement groups – Not all instance types can use the 10Gbit network (e.g. use 8x)  Security – Use instance roles to access S3, deploy in a private subnet/VPC
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - AWS * D28xlarge used as instance type
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - AWS * D28xlarge used as instance type
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - Azure  Compute – Find your instance types for the workload, use heterogeneous clusters – Different instance types for transient (e.g. A and D family) and long running (e.g. Dv2) clusters – Use ARM instead of old API  Storage – Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations) – Storage account scaling limitations – Use WASB or WASB with DASH (default with Cloudbreak) – Azure Data Lake Store – soon – Ephemeral disk is faster than root disk – does not survive auto-updates  Network – No PTR record/reverse lookup support  Security – Integrate/sync with your corporate AD
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - Azure
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - GCP  Compute – Find your instance types for the workload, use heterogeneous clusters – No template based provisioning  Storage – Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations) – Use Google Cloud Storage Connector  Network – Network isolation/DNS problem  Security
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - OpenStack  Compute – Find your instance types for the workload, use heterogeneous clusters – Use Heat templates instead of API calls (we support both)  Storage – Currently we support only Cinder volumes – Swift and Ceph is planned – Data locality through Cloudbreak – let us know your topology or rack/hypervisor mapping  Network – Configure DNS properly – Use multiple network (Neutron) nodes in case of a large cluster  Security – Use Keystone 3 (support for OAuth, Federation, introduction of groups/domains)
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - Mesos  In Tech Preview – come and talk to us after the talk – Or @Hortonworks boot
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You