[2024]Digital Global Overview Report 2024 Meltwater.pdf
Provisioning Big Data Platform using Cloudbreak & Ambari
1. Provisioning Big Data Platform using
Cloudbreak & Ambari
Karthik Karuppaiya Vivek Madani
Sr. Engineering Manager, CPE Sr. Principal Software Engineer, CPE
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
2. Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
3. Introduction
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Symantec
- Symantec is the world leader in providing security software for both enterprises and end
users
- There are 1000’s of Enterprises and more than 400 million devices (Pcs, Tablets and Phones)
that rely on Symantec to help them secure their assets from attacks, including their data
centers, emails and other sensitive data
Cloud Platform Engineering (CPE)
- Build consolidated cloud infrastructure and platform services for next generation data
powered Symantec applications
- A big data platform for batch and stream analytics integrated with both private and public
clouds
- Open source components as building blocks
- Bridge feature gaps and contribute back
4. Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
5. Big Data Platform Challenge
• Hundreds of millions of users generating Billions of events every day from
across the globe
• Hundreds of Big Data Application Developers developing 1000s of
applications
• At 12 PB and 500+ nodes, Cloud Platform Engineering Analytics team built
the largest security data lake at Symantec
• Elasticity is built into the platform to optimize costs in the cloud
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
6. Big Data Platform Challenge
• Great! Now Developers can start building applications on our
Big Data Lake
• 100s of developers start building applications using different big
data tools
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
7. Big Data Platform Challenge
• Product team developers wants quick changes, latest versions
• Platform team wants stability!
• Soon, frustration prevails
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
8. Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
9. What is the Solution?
• Build and use your own little cluster for development
• Copy subset of data for development purposes
• Build elasticity into the platform for cost optimizations
• Tear down the cluster after development is complete
• Repeat and Rinse
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
10. What is the Solution?
• But Building clusters are hard and time consuming
• Too many services to install and configure
• Developers are not interested in building and managing clusters
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
11. What is the Solution? – Self Service
• What if we make it really easy to build clusters?
• Abstract all the deployment complexities and enable developers
to get their own cluster in one click of a button
• Use the same blueprint for both dev and prod clusters
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
12. Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
13. Self Service Analytics (SSA) Clusters
• RESTful web services to allow creation and management of
custom clusters
• Select from pre-defined Ambari Blueprints
• Can provision infrastructure on Openstack as well as AWS
• Installs HDP stack specified as part of Ambari blueprint
• Dashing dashboard to monitor and manage (start/stop/kill)
clusters
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
14. Environment
• Private cloud on Openstack (Kilo, No Heat)
• Public cloud on AWS
• HDP 2.3.2 & 2.4.2
• Ambari 2.1.2 & 2.2
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
15. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
SSA Architecture
16. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
SSA Services
17. San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
SSA Demo
18. Ambari Custom Services
• What about the services that are not supported by Ambari out
of the box?
• We write our own Ambari custom stack
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
19. Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
20. Next Gen SSA
• This is all great! But, lot of work to add more cloud providers.
• Takes a lot of effort to understand the cloud provider’s APIs
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
21. Next Gen SSA – Cloudbreak
• Cloudbreak
–Cloudbreak helps to simplify the provisioning of HDP clusters in cloud
environments
–Supports multiple clouds including AWS, Google, Azure and Openstack
–Uses Apache Ambari for HDP installation and management
–Has a nice UI to build and manage clusters
–Supports automated cluster scaling
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
22. AWS Cluster Architecture
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Private Subnet
Direct Connect
10 Gbps
Data Ingestion Pipes
Telemetry Ingestion Pipes
Datacenter hosts HDP over
bare-metal and Openstack
Uses d3.* and r3.* flavors
Encrypted volumes – LUKS
Non-EBS root volume
Non-Dockerized HDP
Custom AMI
Enhanced networking
Symantec
Datacenter
24. Hybrid Cloud Using Cloudbreak – Customization &
Contribution
• Non-dockerized HDP installation
• Support for Keystone v3 for Openstack
– Cloudbreak 1.2 – released 03/2016
• Support for Custom AMIs
• We have our own hardened images with Enhanced Networking, Volume Encryption, etc
• Support for non-EBS backed root volumes
• Deploy in existing private VPC/Subnet
• Additional AWS instance flavors supported
– We use r3.* and d3.* which are not supported by Cloudbreak
• We build our own Cloudbreak package from the trunk
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
25. Cloudbreak – Keystone V3 Screenshot
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
26. Cloudbreak – Keystone V3 Project Scope Screenshot
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
27. Custom AMI Support
•Org security mandates using specific
hardened AMIs only
•Created our own hardened image with
software and configurations required by
Cloudbreak
•Allows us to use features like:
–Volume encryption, enhanced networking enabled
–Non-EBS volumes
–Symantec specific configurations like LDAP, repos, DNS etc
–Symantec standard for hostnames
•Use jdk1.8 instead of java 7 which comes with
Cloudbreak AMI
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
/cloud-aws/src/main/resources/aws-images.yml
28. Non Dockerized HDP Support
Why?
•No experience running production clusters under docker.
•Unknowns with upgrade path for HDP components.
•Encrypted Disk Volumes had issues working with docker.
What?
•Worked with Cloudbreak team to test out non-Dockerized version of
Cloudbreak
•Provided feedback from our test deployment of the non-Dockerized version
•Feature now available in the master branch
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
29. Non-EBS backed root volume
•Changes to AWS CloudFormation template used by Cloudbreak
•We use ephemeral storage for root volumes for availability
reason
•Will contribute this back as an option to Cloudbreak
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
30. Cloudbreak Contribution – In Progress
•Placement groups
•Multiple security groups attached to one cluster
•Multiple subnet deployment inside VPC
•Support for non-EBS root volumes
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
31. Agenda
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Introduction1
Big Data Platform Challenges2
What is the solution?3
Self Service Analytics Platform Provisioning4
Monitoring & Alerting6
Going Hybrid Cloud using Cloudbreak5
32. Monitoring & Alerting
Now that we have delivered an elephant, the next question from
users is – How is his health?
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
33. Monitoring and Alerting
•Comprehensive dashboards for all environments managed by
the platform team
•Extensively use Ambari Alerts
•QueryX: Custom framework to fill the gaps in Ambari Alerts
•All alerts are sent to OpenTSDB + Grafana stack
•Critical alerts – PagerDuty
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
34. Monitoring and Alerting
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
Ambari Metrics
Collector + QueryX
Cluster 1 Cluster 2 Cluster3
….
OpenTSDB
Grafana
Call Ambari Metrics API
39. Summary and Future Work
• A journey towards one click cluster deployment
• Cloudbreak - one tool for all cloud
- Contribute back the features developed in-house
- Enable Cloudbreak to support Baremetal cluster provisioning
- Auto-scaling using Cloudbreak and Periscope
- Single large YARN cluster for variety of compute and storage loads
• Open source – use and contribute
- Work with community to address gaps
• SSA code already opensourced
- https://github.com/symantec/
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani
40. Thank You!
Q & A
Karthik Karuppaiya
karthik_karuppaiya@symantec.com
Vivek Madani
vivek_madani@symantec.com
San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani