SlideShare une entreprise Scribd logo
1  sur  40
Provisioning Big Data Platform using
Cloudbreak & Ambari
Karthik Karuppaiya Vivek Madani
Sr. Engineering Manager, CPE Sr. Principal Software Engineer, CPE
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
Introduction
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Symantec
- Symantec is the world leader in providing security software for both enterprises and end
users
- There are 1000’s of Enterprises and more than 400 million devices (Pcs, Tablets and Phones)
that rely on Symantec to help them secure their assets from attacks, including their data
centers, emails and other sensitive data
Cloud Platform Engineering (CPE)
- Build consolidated cloud infrastructure and platform services for next generation data
powered Symantec applications
- A big data platform for batch and stream analytics integrated with both private and public
clouds
- Open source components as building blocks
- Bridge feature gaps and contribute back
Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
Big Data Platform Challenge
• Hundreds of millions of users generating Billions of events every day from
across the globe
• Hundreds of Big Data Application Developers developing 1000s of
applications
• At 12 PB and 500+ nodes, Cloud Platform Engineering Analytics team built
the largest security data lake at Symantec
• Elasticity is built into the platform to optimize costs in the cloud
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Big Data Platform Challenge
• Great! Now Developers can start building applications on our
Big Data Lake
• 100s of developers start building applications using different big
data tools
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Big Data Platform Challenge
• Product team developers wants quick changes, latest versions
• Platform team wants stability!
• Soon, frustration prevails
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
What is the Solution?
• Build and use your own little cluster for development
• Copy subset of data for development purposes
• Build elasticity into the platform for cost optimizations
• Tear down the cluster after development is complete
• Repeat and Rinse
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
What is the Solution?
• But Building clusters are hard and time consuming
• Too many services to install and configure
• Developers are not interested in building and managing clusters
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
What is the Solution? – Self Service
• What if we make it really easy to build clusters?
• Abstract all the deployment complexities and enable developers
to get their own cluster in one click of a button
• Use the same blueprint for both dev and prod clusters
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
Self Service Analytics (SSA) Clusters
• RESTful web services to allow creation and management of
custom clusters
• Select from pre-defined Ambari Blueprints
• Can provision infrastructure on Openstack as well as AWS
• Installs HDP stack specified as part of Ambari blueprint
• Dashing dashboard to monitor and manage (start/stop/kill)
clusters
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Environment
• Private cloud on Openstack (Kilo, No Heat)
• Public cloud on AWS
• HDP 2.3.2 & 2.4.2
• Ambari 2.1.2 & 2.2
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
SSA Architecture
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
SSA Services
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
SSA Demo
Ambari Custom Services
• What about the services that are not supported by Ambari out
of the box?
• We write our own Ambari custom stack
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
Next Gen SSA
• This is all great! But, lot of work to add more cloud providers.
• Takes a lot of effort to understand the cloud provider’s APIs
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Next Gen SSA – Cloudbreak
• Cloudbreak
–Cloudbreak helps to simplify the provisioning of HDP clusters in cloud
environments
–Supports multiple clouds including AWS, Google, Azure and Openstack
–Uses Apache Ambari for HDP installation and management
–Has a nice UI to build and manage clusters
–Supports automated cluster scaling
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
AWS Cluster Architecture
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Private Subnet
 Direct Connect
 10 Gbps
 Data Ingestion Pipes
 Telemetry Ingestion Pipes
 Datacenter hosts HDP over
bare-metal and Openstack
 Uses d3.* and r3.* flavors
 Encrypted volumes – LUKS
 Non-EBS root volume
 Non-Dockerized HDP
 Custom AMI
 Enhanced networking
Symantec
Datacenter
Cloudbreak Demo
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Hybrid Cloud Using Cloudbreak – Customization &
Contribution
• Non-dockerized HDP installation
• Support for Keystone v3 for Openstack
– Cloudbreak 1.2 – released 03/2016
• Support for Custom AMIs
• We have our own hardened images with Enhanced Networking, Volume Encryption, etc
• Support for non-EBS backed root volumes
• Deploy in existing private VPC/Subnet
• Additional AWS instance flavors supported
– We use r3.* and d3.* which are not supported by Cloudbreak
• We build our own Cloudbreak package from the trunk
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Cloudbreak – Keystone V3 Screenshot
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Cloudbreak – Keystone V3 Project Scope Screenshot
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Custom AMI Support
•Org security mandates using specific
hardened AMIs only
•Created our own hardened image with
software and configurations required by
Cloudbreak
•Allows us to use features like:
–Volume encryption, enhanced networking enabled
–Non-EBS volumes
–Symantec specific configurations like LDAP, repos, DNS etc
–Symantec standard for hostnames
•Use jdk1.8 instead of java 7 which comes with
Cloudbreak AMI
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
/cloud-aws/src/main/resources/aws-images.yml
Non Dockerized HDP Support
Why?
•No experience running production clusters under docker.
•Unknowns with upgrade path for HDP components.
•Encrypted Disk Volumes had issues working with docker.
What?
•Worked with Cloudbreak team to test out non-Dockerized version of
Cloudbreak
•Provided feedback from our test deployment of the non-Dockerized version
•Feature now available in the master branch
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Non-EBS backed root volume
•Changes to AWS CloudFormation template used by Cloudbreak
•We use ephemeral storage for root volumes for availability
reason
•Will contribute this back as an option to Cloudbreak
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Cloudbreak Contribution – In Progress
•Placement groups
•Multiple security groups attached to one cluster
•Multiple subnet deployment inside VPC
•Support for non-EBS root volumes
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Monitoring & Alerting6
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting
Now that we have delivered an elephant, the next question from
users is – How is his health?
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Monitoring and Alerting
•Comprehensive dashboards for all environments managed by
the platform team
•Extensively use Ambari Alerts
•QueryX: Custom framework to fill the gaps in Ambari Alerts
•All alerts are sent to OpenTSDB + Grafana stack
•Critical alerts – PagerDuty
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Monitoring and Alerting
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Ambari Metrics
Collector + QueryX
Cluster 1 Cluster 2 Cluster3
….
OpenTSDB
Grafana
Call Ambari Metrics API
Grafana Dashboards
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Grafana Dashboards
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Ambari Alerts
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Ambari Alerts
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Summary and Future Work
• A journey towards one click cluster deployment
• Cloudbreak - one tool for all cloud
- Contribute back the features developed in-house
- Enable Cloudbreak to support Baremetal cluster provisioning
- Auto-scaling using Cloudbreak and Periscope
- Single large YARN cluster for variety of compute and storage loads
• Open source – use and contribute
- Work with community to address gaps
• SSA code already opensourced
- https://github.com/symantec/
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Thank You!
Q & A
Karthik Karuppaiya
karthik_karuppaiya@symantec.com
Vivek Madani
vivek_madani@symantec.com
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani

Contenu connexe

Tendances

Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionAlessandro Salvatico
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingDataWorks Summit
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...DataWorks Summit/Hadoop Summit
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopDataWorks Summit
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 

Tendances (20)

Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 

En vedette

Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Janos Matyas
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 
Docker based Hadoop Deployment
Docker based Hadoop DeploymentDocker based Hadoop Deployment
Docker based Hadoop DeploymentRakesh Saha
 
AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...
AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...
AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...Chris Whelan
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAmazon Web Services
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streamsunivalence
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)Mathieu Bastian
 
Upping your NiFi Game with Docker
Upping your NiFi Game with DockerUpping your NiFi Game with Docker
Upping your NiFi Game with DockerAldrin Piri
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
 

En vedette (20)

On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 
Docker based Hadoop Deployment
Docker based Hadoop DeploymentDocker based Hadoop Deployment
Docker based Hadoop Deployment
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...
AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...
AMP Lab presentation -- Cloudbreak: A MapReduce Algorithm for Detecting Genom...
 
Knowledge from Noise
Knowledge from Noise Knowledge from Noise
Knowledge from Noise
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streams
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
 
Upping your NiFi Game with Docker
Upping your NiFi Game with DockerUpping your NiFi Game with Docker
Upping your NiFi Game with Docker
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 

Similaire à Provisioning Big Data Platform using Cloudbreak & Ambari

Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020HostedbyConfluent
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the CloudMatt Lord
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15IBMInfoSphereUGFR
 
Look Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWSLook Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWSDevOps.com
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013Dealmaker Media
 
MySQL Intro JSON NoSQL
MySQL Intro JSON NoSQLMySQL Intro JSON NoSQL
MySQL Intro JSON NoSQLMark Swarbrick
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency DatabaseScyllaDB
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonSumit Sarkar
 
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
ShareChat’s Path to High-Performance NoSQL with ScyllaDBShareChat’s Path to High-Performance NoSQL with ScyllaDB
ShareChat’s Path to High-Performance NoSQL with ScyllaDBScyllaDB
 
Cloudera Showcase Cask
Cloudera Showcase CaskCloudera Showcase Cask
Cloudera Showcase CaskCloudera, Inc.
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
DEVNET-1141	Dynamic Dockerized Hadoop ProvisioningDEVNET-1141	Dynamic Dockerized Hadoop Provisioning
DEVNET-1141 Dynamic Dockerized Hadoop ProvisioningCisco DevNet
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Databricks
 

Similaire à Provisioning Big Data Platform using Cloudbreak & Ambari (20)

Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the Cloud
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
 
Build a Cloud Day Paris
Build a Cloud Day ParisBuild a Cloud Day Paris
Build a Cloud Day Paris
 
Look Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWSLook Before You Leap: Migrating On-Premises Hadoop to AWS
Look Before You Leap: Migrating On-Premises Hadoop to AWS
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013
 
MySQL Intro JSON NoSQL
MySQL Intro JSON NoSQLMySQL Intro JSON NoSQL
MySQL Intro JSON NoSQL
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, Python
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
ShareChat’s Path to High-Performance NoSQL with ScyllaDBShareChat’s Path to High-Performance NoSQL with ScyllaDB
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
 
Cloudera Showcase Cask
Cloudera Showcase CaskCloudera Showcase Cask
Cloudera Showcase Cask
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
DEVNET-1141	Dynamic Dockerized Hadoop ProvisioningDEVNET-1141	Dynamic Dockerized Hadoop Provisioning
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 

Plus de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Provisioning Big Data Platform using Cloudbreak & Ambari

  • 1. Provisioning Big Data Platform using Cloudbreak & Ambari Karthik Karuppaiya Vivek Madani Sr. Engineering Manager, CPE Sr. Principal Software Engineer, CPE San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 2. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  • 3. Introduction San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Symantec - Symantec is the world leader in providing security software for both enterprises and end users - There are 1000’s of Enterprises and more than 400 million devices (Pcs, Tablets and Phones) that rely on Symantec to help them secure their assets from attacks, including their data centers, emails and other sensitive data Cloud Platform Engineering (CPE) - Build consolidated cloud infrastructure and platform services for next generation data powered Symantec applications - A big data platform for batch and stream analytics integrated with both private and public clouds - Open source components as building blocks - Bridge feature gaps and contribute back
  • 4. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  • 5. Big Data Platform Challenge • Hundreds of millions of users generating Billions of events every day from across the globe • Hundreds of Big Data Application Developers developing 1000s of applications • At 12 PB and 500+ nodes, Cloud Platform Engineering Analytics team built the largest security data lake at Symantec • Elasticity is built into the platform to optimize costs in the cloud San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 6. Big Data Platform Challenge • Great! Now Developers can start building applications on our Big Data Lake • 100s of developers start building applications using different big data tools San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 7. Big Data Platform Challenge • Product team developers wants quick changes, latest versions • Platform team wants stability! • Soon, frustration prevails San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 8. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  • 9. What is the Solution? • Build and use your own little cluster for development • Copy subset of data for development purposes • Build elasticity into the platform for cost optimizations • Tear down the cluster after development is complete • Repeat and Rinse San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 10. What is the Solution? • But Building clusters are hard and time consuming • Too many services to install and configure • Developers are not interested in building and managing clusters San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 11. What is the Solution? – Self Service • What if we make it really easy to build clusters? • Abstract all the deployment complexities and enable developers to get their own cluster in one click of a button • Use the same blueprint for both dev and prod clusters San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 12. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  • 13. Self Service Analytics (SSA) Clusters • RESTful web services to allow creation and management of custom clusters • Select from pre-defined Ambari Blueprints • Can provision infrastructure on Openstack as well as AWS • Installs HDP stack specified as part of Ambari blueprint • Dashing dashboard to monitor and manage (start/stop/kill) clusters San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 14. Environment • Private cloud on Openstack (Kilo, No Heat) • Public cloud on AWS • HDP 2.3.2 & 2.4.2 • Ambari 2.1.2 & 2.2 San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 15. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani SSA Architecture
  • 16. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani SSA Services
  • 17. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani SSA Demo
  • 18. Ambari Custom Services • What about the services that are not supported by Ambari out of the box? • We write our own Ambari custom stack San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 19. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Going Hybrid Cloud using Cloudbreak5 Monitoring & Alerting6
  • 20. Next Gen SSA • This is all great! But, lot of work to add more cloud providers. • Takes a lot of effort to understand the cloud provider’s APIs San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 21. Next Gen SSA – Cloudbreak • Cloudbreak –Cloudbreak helps to simplify the provisioning of HDP clusters in cloud environments –Supports multiple clouds including AWS, Google, Azure and Openstack –Uses Apache Ambari for HDP installation and management –Has a nice UI to build and manage clusters –Supports automated cluster scaling San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 22. AWS Cluster Architecture San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Private Subnet  Direct Connect  10 Gbps  Data Ingestion Pipes  Telemetry Ingestion Pipes  Datacenter hosts HDP over bare-metal and Openstack  Uses d3.* and r3.* flavors  Encrypted volumes – LUKS  Non-EBS root volume  Non-Dockerized HDP  Custom AMI  Enhanced networking Symantec Datacenter
  • 23. Cloudbreak Demo San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 24. Hybrid Cloud Using Cloudbreak – Customization & Contribution • Non-dockerized HDP installation • Support for Keystone v3 for Openstack – Cloudbreak 1.2 – released 03/2016 • Support for Custom AMIs • We have our own hardened images with Enhanced Networking, Volume Encryption, etc • Support for non-EBS backed root volumes • Deploy in existing private VPC/Subnet • Additional AWS instance flavors supported – We use r3.* and d3.* which are not supported by Cloudbreak • We build our own Cloudbreak package from the trunk San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 25. Cloudbreak – Keystone V3 Screenshot San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 26. Cloudbreak – Keystone V3 Project Scope Screenshot San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 27. Custom AMI Support •Org security mandates using specific hardened AMIs only •Created our own hardened image with software and configurations required by Cloudbreak •Allows us to use features like: –Volume encryption, enhanced networking enabled –Non-EBS volumes –Symantec specific configurations like LDAP, repos, DNS etc –Symantec standard for hostnames •Use jdk1.8 instead of java 7 which comes with Cloudbreak AMI San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani /cloud-aws/src/main/resources/aws-images.yml
  • 28. Non Dockerized HDP Support Why? •No experience running production clusters under docker. •Unknowns with upgrade path for HDP components. •Encrypted Disk Volumes had issues working with docker. What? •Worked with Cloudbreak team to test out non-Dockerized version of Cloudbreak •Provided feedback from our test deployment of the non-Dockerized version •Feature now available in the master branch San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 29. Non-EBS backed root volume •Changes to AWS CloudFormation template used by Cloudbreak •We use ephemeral storage for root volumes for availability reason •Will contribute this back as an option to Cloudbreak San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 30. Cloudbreak Contribution – In Progress •Placement groups •Multiple security groups attached to one cluster •Multiple subnet deployment inside VPC •Support for non-EBS root volumes San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 31. Agenda San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Introduction1 Big Data Platform Challenges2 What is the solution?3 Self Service Analytics Platform Provisioning4 Monitoring & Alerting6 Going Hybrid Cloud using Cloudbreak5
  • 32. Monitoring & Alerting Now that we have delivered an elephant, the next question from users is – How is his health? San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 33. Monitoring and Alerting •Comprehensive dashboards for all environments managed by the platform team •Extensively use Ambari Alerts •QueryX: Custom framework to fill the gaps in Ambari Alerts •All alerts are sent to OpenTSDB + Grafana stack •Critical alerts – PagerDuty San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 34. Monitoring and Alerting San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani Ambari Metrics Collector + QueryX Cluster 1 Cluster 2 Cluster3 …. OpenTSDB Grafana Call Ambari Metrics API
  • 35. Grafana Dashboards San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 36. Grafana Dashboards San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 37. Ambari Alerts San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 38. Ambari Alerts San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 39. Summary and Future Work • A journey towards one click cluster deployment • Cloudbreak - one tool for all cloud - Contribute back the features developed in-house - Enable Cloudbreak to support Baremetal cluster provisioning - Auto-scaling using Cloudbreak and Periscope - Single large YARN cluster for variety of compute and storage loads • Open source – use and contribute - Work with community to address gaps • SSA code already opensourced - https://github.com/symantec/ San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
  • 40. Thank You! Q & A Karthik Karuppaiya karthik_karuppaiya@symantec.com Vivek Madani vivek_madani@symantec.com San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani