SlideShare a Scribd company logo
1 of 11
Build & Managing HadoopBuild & Managing Hadoop
with Chefwith Chef
with Chefwith Chef
John Martin
Sr Director, Production Engineering
IntroductionIntroduction
• Me, Me, Me
• 10+ years in .com & JEE space
• Project Crew
• Paul MacDougall
• Greg Rokita
• KC Braunschweig (former)
• Ryan Holmes (former)
• Edmunds.com
• Founded in 1966
• Gopher site in 1994
• HTTP site in 1995
Edmunds.com EnvironmentEdmunds.com Environment
• Nearing 3000 hosts
• Heavily virtualized
(Xen, CloudStack, AWS)
• Tomcat with some WebLogic
• Coherence
Solr
Mongo
• Publishing built on ActiveMQ
• Newly launched DWH built
around Hadoop + Netezza
• Explosive infrastructure growth
• Quick to bootstrap
• Easy integration with our tooling
• knife
• The Chef Community
Why Chef?Why Chef?
• Open framework for data-intensive distributed applications
• Reigning King of “Big Data”
• Many services
• HDFS
• MapReduce
• HBase
• ZooKeeper
• Designed to run on
commodity hardware
What’s Hadoop?What’s Hadoop?
• Multiple Clusters
• Roughly 200Tb in total
• 40+ nodes in production
• Maintained by Ops + Dev
• Dell R410
• Six-core 2.40Ghz
• 24Gb RAM
• 4x 1Tb 7200RPMs
Edmunds Hadoop EnvironmentEdmunds Hadoop Environment
• First cluster was a Frankenstein
• Part BMC
• Part manual effort
• Part Puppet
• Staff changes & knowledge loss
• Time for a clean slate!
How We Got HereHow We Got Here
• True Dev + Ops effort
• Production built in 3 weeks
• Built with community cookbooks
• All services now administered with knife
• New nodes now cluster-ready within minutes
Building Hadoop with ChefBuilding Hadoop with Chef
• First highly-visible Chef success story at Edmunds
• Cemented Chef as our CM solution
• Engaged us with the community
• Completely automated Hadoop infrastructure
• New suite of administrative scripts
• knife-[start|stop]-all.sh $cluster
• knife-[start|stop]-hbase.sh $cluster
• knife-[start|stop]-mapred.sh $cluster
• knife-[start|stop]-oozie.sh $cluster
What We GainedWhat We Gained
• New cluster currently being built!
• Integration with Cloudera Manager
• Cluster replication
• Continue evangelism of
Chef’s awesomeness
• Extend more of the toolchain
around Chef
• See you around at the LA Chef UG!
Where Next?Where Next?
Thank you!Thank you!
• email: jmartin@edmunds.com
• twitter: @tekbuddha

More Related Content

What's hot

Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6Chef
 
Chef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation SetupChef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation SetupChef Software, Inc.
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012Matt Ray
 
Automated Infrastructure and Application Management
Automated Infrastructure and Application ManagementAutomated Infrastructure and Application Management
Automated Infrastructure and Application ManagementClark Everetts
 
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefOpscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefChef Software, Inc.
 
Automated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAutomated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAlberto Molina Coballes
 
Chef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly RoadmapChef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly RoadmapMatt Ray
 
Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5Chef
 
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Chef
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Matt Ray
 
Building a PaaS using Chef
Building a PaaS using ChefBuilding a PaaS using Chef
Building a PaaS using ChefShaun Domingo
 
Boston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack DaysBoston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack DaysMatt Ray
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Chef
 
Common configuration with Data Bags - Fundamentals Webinar Series Part 4
Common configuration with Data Bags - Fundamentals Webinar Series Part 4Common configuration with Data Bags - Fundamentals Webinar Series Part 4
Common configuration with Data Bags - Fundamentals Webinar Series Part 4Chef
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Software, Inc.
 
Atlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
Atlanta OpenStack 2014 Chef for OpenStack Deployment WorkshopAtlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
Atlanta OpenStack 2014 Chef for OpenStack Deployment WorkshopMatt Ray
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetNicolas Brousse
 
Vagrant, Chef and TYPO3 - A Love Affair
Vagrant, Chef and TYPO3 - A Love AffairVagrant, Chef and TYPO3 - A Love Affair
Vagrant, Chef and TYPO3 - A Love AffairMichael Lihs
 

What's hot (20)

Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
 
Chef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation SetupChef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation Setup
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012
 
Automated Infrastructure and Application Management
Automated Infrastructure and Application ManagementAutomated Infrastructure and Application Management
Automated Infrastructure and Application Management
 
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefOpscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with Chef
 
Automated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAutomated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. Ansible
 
Chef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly RoadmapChef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly Roadmap
 
The unintended benefits of Chef
The unintended benefits of ChefThe unintended benefits of Chef
The unintended benefits of Chef
 
Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5
 
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013
 
Docker
DockerDocker
Docker
 
Building a PaaS using Chef
Building a PaaS using ChefBuilding a PaaS using Chef
Building a PaaS using Chef
 
Boston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack DaysBoston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack Days
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3
 
Common configuration with Data Bags - Fundamentals Webinar Series Part 4
Common configuration with Data Bags - Fundamentals Webinar Series Part 4Common configuration with Data Bags - Fundamentals Webinar Series Part 4
Common configuration with Data Bags - Fundamentals Webinar Series Part 4
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
 
Atlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
Atlanta OpenStack 2014 Chef for OpenStack Deployment WorkshopAtlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
Atlanta OpenStack 2014 Chef for OpenStack Deployment Workshop
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with Puppet
 
Vagrant, Chef and TYPO3 - A Love Affair
Vagrant, Chef and TYPO3 - A Love AffairVagrant, Chef and TYPO3 - A Love Affair
Vagrant, Chef and TYPO3 - A Love Affair
 

Similar to Building Hadoop with Chef

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)sKaushikNarayanan
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)MvkZ
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)sKaushikNarayanan
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)MvkZ
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)sKaushikNarayanan
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformrhatr
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisionsTrent Hornibrook
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect WorkshopPatrick Chanezon
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Managing Distributed Systems with Chef
Managing Distributed Systems with ChefManaging Distributed Systems with Chef
Managing Distributed Systems with ChefMandi Walls
 
Databases in the Hosted Cloud
Databases in the Hosted CloudDatabases in the Hosted Cloud
Databases in the Hosted CloudColin Charles
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted CloudColin Charles
 
DNN & The CloudOS: Windows Azure on your terms
DNN & The CloudOS: Windows Azure on your termsDNN & The CloudOS: Windows Azure on your terms
DNN & The CloudOS: Windows Azure on your termsJess Coburn
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel RidingChristian Posta
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsNETWAYS
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesVishal Biyani
 

Similar to Building Hadoop with Chef (20)

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)
 
Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)Big datatraining.in devops-part2 (1)
Big datatraining.in devops-part2 (1)
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisions
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Managing Distributed Systems with Chef
Managing Distributed Systems with ChefManaging Distributed Systems with Chef
Managing Distributed Systems with Chef
 
Databases in the Hosted Cloud
Databases in the Hosted CloudDatabases in the Hosted Cloud
Databases in the Hosted Cloud
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted Cloud
 
Chef For OpenStack Overview
Chef For OpenStack OverviewChef For OpenStack Overview
Chef For OpenStack Overview
 
DNN & The CloudOS: Windows Azure on your terms
DNN & The CloudOS: Windows Azure on your termsDNN & The CloudOS: Windows Azure on your terms
DNN & The CloudOS: Windows Azure on your terms
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel Riding
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy Hawkins
 
Briefing: Containers
Briefing: ContainersBriefing: Containers
Briefing: Containers
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher Kubernetes
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Building Hadoop with Chef

  • 1. Build & Managing HadoopBuild & Managing Hadoop with Chefwith Chef with Chefwith Chef John Martin Sr Director, Production Engineering
  • 2. IntroductionIntroduction • Me, Me, Me • 10+ years in .com & JEE space • Project Crew • Paul MacDougall • Greg Rokita • KC Braunschweig (former) • Ryan Holmes (former) • Edmunds.com • Founded in 1966 • Gopher site in 1994 • HTTP site in 1995
  • 3. Edmunds.com EnvironmentEdmunds.com Environment • Nearing 3000 hosts • Heavily virtualized (Xen, CloudStack, AWS) • Tomcat with some WebLogic • Coherence Solr Mongo • Publishing built on ActiveMQ • Newly launched DWH built around Hadoop + Netezza
  • 4. • Explosive infrastructure growth • Quick to bootstrap • Easy integration with our tooling • knife • The Chef Community Why Chef?Why Chef?
  • 5. • Open framework for data-intensive distributed applications • Reigning King of “Big Data” • Many services • HDFS • MapReduce • HBase • ZooKeeper • Designed to run on commodity hardware What’s Hadoop?What’s Hadoop?
  • 6. • Multiple Clusters • Roughly 200Tb in total • 40+ nodes in production • Maintained by Ops + Dev • Dell R410 • Six-core 2.40Ghz • 24Gb RAM • 4x 1Tb 7200RPMs Edmunds Hadoop EnvironmentEdmunds Hadoop Environment
  • 7. • First cluster was a Frankenstein • Part BMC • Part manual effort • Part Puppet • Staff changes & knowledge loss • Time for a clean slate! How We Got HereHow We Got Here
  • 8. • True Dev + Ops effort • Production built in 3 weeks • Built with community cookbooks • All services now administered with knife • New nodes now cluster-ready within minutes Building Hadoop with ChefBuilding Hadoop with Chef
  • 9. • First highly-visible Chef success story at Edmunds • Cemented Chef as our CM solution • Engaged us with the community • Completely automated Hadoop infrastructure • New suite of administrative scripts • knife-[start|stop]-all.sh $cluster • knife-[start|stop]-hbase.sh $cluster • knife-[start|stop]-mapred.sh $cluster • knife-[start|stop]-oozie.sh $cluster What We GainedWhat We Gained
  • 10. • New cluster currently being built! • Integration with Cloudera Manager • Cluster replication • Continue evangelism of Chef’s awesomeness • Extend more of the toolchain around Chef • See you around at the LA Chef UG! Where Next?Where Next?
  • 11. Thank you!Thank you! • email: jmartin@edmunds.com • twitter: @tekbuddha

Editor's Notes

  1. We currently have close to 3000 hosts deployed in our environments. We are highly virtualized, relying on a mix of RHEL Xen and CloudStack. Historically, our server vendor of choice has been Dell but we began using Cisco’s UCS chassis about a year ago to back our CloudStack puffs. We’re a Java shop, with Tomcat 7 being our container of choice. There are still a few WebLogic applications floating around, but they are slowing being rebuilt on Tomcat Our web apps rely on a mix of Oracle Coherence, Solr, and Mongo. All data and content on the website is kept up-to-date using our homegrown publishing solution built on ActiveMQ. And on the topic you’re hear today to hear… we are about to unveil our new data warehouse that has been built with Hadoop, HBase, and Netezza. I’ll be getting in to that shortly. There’s more to our environment, such as Oracle RACs and BPEL services, but everything that’s listed here has been built and is supported with Chef. Since our adoption about 18 months ago, we have brought nearly all our services under Chef management. Our migration from WebLogic to Tomcat has been aided greatly by Chef adoption, shaving months off the original estimates.
  2. How did Edmunds come to adopt Chef? Well, for a few years we had been trying to get our heads around config management. With explosive growth in the number of hosts we were building, it was an absolute necessity for us. We were customers of one of the big names in the space, but ultimately found the tool to be a challenge for us. So we began looking for something to replace it and started experimenting with chef and puppet. One of those experiments was with a Hadoop cluster. So as we begin to put these different offerings through their paces, there were a few things about Chef that stood out to us. The first was that it was easy for us to get up and running with Chef. With very minimal effort we had a Chef server, setup a repo, and we were off to the races. We were then trying to figure out how well it bolted together with other tools in our toolchain. The good news was that it wasn’t going too be too difficult for us. The bad news is that we realized we didn’t like some of the other tools in the chain and were going to re-write them now that we had a better configuration management tool. (That’s for a whole other presentation.) knife was something that seemed easy for our admins to pick up. It was intuitive to wrap our heads around and it was easy for us to see how powerful a weapon it could be. Other CMs have their equivalents, but it felt like we could get more done with knife with a shorter ramp up. Lastly, the Chef community ended up being a big factor in how Chef was adopted at Edmunds. There’s such a wealth of knowledge sharing – not just from Opscode – but from the daily users of Chef. Mailing lists, IRC, Twitter, blogs; the places we could go to for help when we were learning our way was invaluable in those early days.
  3. A really quick overview for the uninitiated as to what Hadoop is… In short, Hadoop is an open framework for running data-intensive distributed applications. When you hear anyone marketing “big data”, the first thing that comes to mind is Hadoop; it is the reigning king of “big data”. One of the projects co-creators, Doug Cutting, named the project after his son’s toy elephant, Hadoop. The Hadoop framework is actually a collection of services. HDFS, at the foundation of the framework, is distributed file system. Each data node in your Hadoop cluster runs a localized copy of HDFS. It’s through this distribution of data that another of Hadoop’s services – MapReduce – gains its speed. By distributing copies of the data across multiple nodes, MapReduce and HBase are able to perform highly-parallelized tasks at great speed. ZooKeeper is a configuration registry service. As the name implies, ZooKeeper keeps track of all the animals running wild in your Hadoop cluster. It’s highly customizable and several folks – ourselves included – have started using ZooKeeper for non-Hadoop related projects. While there are a lot of companies that sell “big data” platforms or appliances, the Hadoop platform was specifically designed to run on commodity hardware. It was born out of the mind of Google’s MapReduce and Google File System white papers, where Google talked about the massive scale at which they had prototyped the services on cheap whitebox servers. In 2011, Facebook staked claim to the largest Hadoop cluster in the world, clocking in at 30 petabytes across thousands of servers. Now our Hadoop cluster isn’t anywhere near that size, but you can already see a need for some sort of configuration management solution to this problem, can’t you? There is a lot more to Hadoop, HDFS, MapReduce, and Hbase so please don’t view this as a comprehensive view into any of them. I simply can not do it justice in one slide and encourage anyone not familiar with its capabilities to do further research on it.
  4. Okay – Let’s get to the meat of the discussion, our Hadoop environment. We have two pre-production and one production Hadoop cluster. There is approximately 200Tb in total across the clusters, with the majority of that being housed in the production cluster. The node counts you see to the side are for our production cluster. As I mentioned earlier, our hosts are built on Dell hardware. In these clusters, we’re relying on a fleet of Dell R410s, a low-end PowerEdge model, geared with this type of work in mind. There are just over 40 of these nodes in the production cluster. These clusters are managed daily by the combined efforts of by Ops and Devs. This is where we’re at today. But we didn’t start with this setup. In fact, it’s a bit of painful story as to how we got here.
  5. Earlier, I’d mentioned that prior to our full adoption of Chef, we had been experimenting with other solutions. There was a five or six month period where we had been experimenting with both Chef and Puppet. Chef had certainly won several of us over, but others remained uncertain. As a result, we had a Hadoop cluster that was half-baked with Puppet. Because we were also learning our way around Hadoop at the time, that’s where most of our attention went. As a result, there was a large of number of manual pieces in the Hadoop-Puppet cluster that just weren’t scaling correctly for us. This is really no fault of or knock against Puppet. As I said, we simply were not focusing the priority on the config management aspect of this project, but rather the Hadoop services themselves. When the Systems Engineer/Architect of that solution left the organization, many of us were scratching our heads as to how the thing functioned. The documentation wasn’t well flushed out and the Hadoop project owners were pushing for some large expansion in a short period of time. The team which had been focused on evaluating config management tools wanted their shot as this challenge. We got the Hadoop project owners to allow for a one-month freeze on their expansion requests and went to work. The path forward was clear. Rather than try to figure out how to upright the Hadoop-Puppet project, the best thing to do was scrap it and move forward with a cluster built with Chef.
  6. That put into motion our first highly visible project using Chef. This was collaboration at its finest. In a single week, engineers across Dev + Ops came together from both the Hadoop and config management projects to scope out the effort. Then, within 3 weeks, all the cookbooks necessary to fully automate and manage our Hadoop infrastructure were put in place. This could not have been accomplished without cookbooks already available from the community. More specifically, we relied heavily on the ‘hadoop_cluster’ cookbook by InfoChimps to get us up and running. We weren’t out to reinvent the wheel and are far from being an unique snowflake. Leveraging the cookbooks already out there we shaved precious time off our effort. So over the course of that 3 weeks, the InfoChimps cookbooks were groomed to our environment. We did write a few of our own for HBase and the deployment of Oozie workflows as well. We may get to a point in the future where we push those changes and new cookbooks back out into the community, given how helpful they were to us. At the end of the effort, we had a brand new production Hadoop cluster that was fully automated. Now adding a new node takes as little as 15min once the host is racked and available for bootstrapping. Last summer the production cluster was expanded yet again and with minimal effort.
  7. Our production Hadoop cluster being managed by Chef was a significant win and helped us gain a lot of traction internally in solidifying it as our configuration manager. It was the first highly-visible success story with Chef. We had asked for a month’s reprieve on delivery to our project owner and in that time revitalized our Hadoop infrastructure. Not only had we demonstrated the power of Chef, we had shown the power of leveraging the community. I know I’ve said it a few times already, but I simply can’t stress enough how difficult a task this would have been for us if we had to write all the cookbooks necessary to build these clusters. It encouraged us to begin engaging with our peers of other technical communities as well. There was a very positive kick out of our shells because of the positive interactions with the Chef community. The “S” in the DevOps CLAMS acronym is for “sharing” and it’s something we have really embraced because of the initial experience we had with this project. Kudos to everyone of you that are out there are participating. So now we had a fully automated Hadoop infrastructure. Gone were the days of half-automated/half-manual administrative tasks. Completely automated from top to bottom. What’s more… because of this automation, we were able to provide some great scripts that leverage knife for starting and stopping either the entire cluster – dangerous, and not really suggested! – or specific services with in the cluster. While these scripts don’t do anything magical and are really just nifty knife ssh’s based on roles, they actually abstract any required knowledge of how to use knife ssh. They were really great in the early days of Chef adoption at Edmunds because we could demonstrate Chef’s capabilities without requiring a huge amount of upfront education to new users.
  8. So where are we going with all this? To start, we’ll be building a new production Hadoop cluster in the next couple of months. It will be a significant re-architecture for us as we’ll be using larger boxes than we have in the past. What’s more, we’re going to take a stab at using Cloudera Manager to help us with the built out. Cloudera Manager will provide some great insight into the performance and health of our cluster that we’ve been missing. We’re expecting that it will help our development teams find performance bottlenecks within our Hadoop clusters. We’re also going to get into cluster replication as well as exploring how to make this it work across our data centers. That’s some new territory for us, so it should be interesting experimenting with that. On the Chef front, we’ll continue with our evangelism of just how awesome a tool we think it is. We’re really excited to be a part of the new LA Chef UG that’s been started. We’ll also continue to extend our tooling around Chef. Right now, we’ve got several different tools already that are making great use of Chef and I don’t see that integration stopping anytime in the foreseeable future. That’s about all I had to present today. Thanks for your time.