SlideShare a Scribd company logo
1 of 78
Instant Hadoop of your Own




         Created by Jack Bezalel
            Senior IT Architect
 As part of the CTE Mentorship Program
             CA Technologies
What’s Hadoop all about?
• OPPORTUNITY: We have access to amazingly
  valuable data (Social Media, Mobile, …)
• Problem: Data is seldom UN-Structured
• Relational and data warehouse MUST have
  Structured Data, so they are off the list
• Hadoop = fast, reliable analysis of both
  structured data and complex data
What’s in Hadoop?
• Reliable data storage using the Hadoop
  Distributed File System (HDFS)
• High-Performance parallel data processing
  using a technique called MapReduce.
How does it scale so well?
• Hadoop runs on a collection of commodity,
  shared-nothing servers
• You can add or remove servers in a Hadoop
  cluster at will
• The system detects and compensates for
  hardware or system problems on any server.
  (self-healing)
Who uses Hadoop?
• Originally developed and employed by Yahoo and
  Facebook
• Hadoop is now widely used in
  –   Finance
  –   Technology
  –   Telecom
  –   media and entertainment
  –   Government
  –   research institutions and other markets with
      significant data.
Why did we use Cloudera’s Hadoop
                kit?
• Cloudera is an active contributor to the
  Hadoop project
• Provides an enterprise-ready, commercial
  Distribution for Hadoop
• Cloudera Distribution saves time by bundling
  and testing the most popular projects related
  to Hadoop into a single easier to use package
The solution we tested is provided by
         Cloudera Free Edition
• Automates the installation and configuration
  of CDH3
• Entire cluster (up to 50 nodes)
• requiring only root SSH access to your cluster's
  machines
• Download Here:
  https://ccp.cloudera.com/display/SUPPORT/Cl
  oudera+Manager+Free+Edition+Download
Cloudera Manager Free Edition
             consists of:
• A small self-executing Cloudera Manager
  installation program
• Server and other packages in preparation for
  cluster host installation
• Cloudera Manager wizard for automating
  CDH3 installation and configuration on the
  cluster
• Cloudera Manager monitoring and configuring
  the cluster after installation is completed
What does Cloudera Include - Flume
• Flume — Reliable Data Mover
• The primary use case
  – a logging system
  – gathers a set of log files on every machine
  – aggregates them to a centralized persistent store
    (such as HDFS)
What does Cloudera Include - Sqoop
• Sqoop — A tool that imports / exports data
  between relational databases and Hadoop
  clusters.
• Using JDBC imports into a Hadoop HDFS
• Generates Java classes that enable users to
  interpret the table's schema
What does Cloudera Include - Hue
• Hue — GUI to work with CDH
• Web application
What does Cloudera Include - Pig
• Pig — Analyzes large amounts of data
• Using Pig's query language called Pig Latin
• Queries run distributed on a Hadoop cluster
What does Cloudera Include - Hive
• Hive — A powerful data warehousing APP
• Enables access your data using Hive QL
• Hive QL = language that is similar to SQL.
What does Cloudera Include - HBase
• HBase — Large-scale tabular storage
• Using HDFS
• Cloudera recommends installing HBase in a
  standalone mode before you try to run it on a
  whole cluster.
What does Cloudera Include -
             ZooKeeper
• Zookeeper — Service that provides
  coordination between distributed processes.
What does Cloudera Include - Oozie
• Oozie — A server-based workflow engine
• Runs workflow jobs with actions that execute
  Hadoop jobs
• A command line client is also available for
  Remote Management
What does Cloudera Include – 3 last
      strangely named tools…
• Whirr — Provides a fast way to run cloud
  services
• Snappy — A compression/decompression
  library
• Mahout — A machine-learning tool. By
  enabling you to build machine-learning
  libraries that are scalable to "reasonably
  large" datasets, it aims to make building
  intelligent applications easier and faster
Setup Walkthrough
• Use Redhat RH5.5+ (CentOS and others
  supported as well, we used RH5.7)
• 64bit only
• 3 VMs used:
  – Cloudera Manager
  – 2 Nodes to deploy Hadoop on
About the Cloudera Manager Free
     Edition Installation Program
• Automatically Installs the package repositories
  for Cloudera Manager and the Oracle (JDK)
• Installs the Cloudera Manager Server
• Installs and configures an embedded
  PostgreSQL database
Download the CDH3 (Cloudera)
             Manager
• http://archive.cloudera.com/cloudera-
  manager/installer/latest/cloudera-manager-
  installer.bin
Set yum.conf with your proxy if exists
• Add those lines to /etc/yum.conf in your first
  Redhat Hadoop node (example here)
proxy=http://proxy.corp.com:80
proxy_username=username
proxy_password=password
Let the show begin!
• Make sure Selinux is disabled, or this won’t work!
  – View file /etc/sysconfig/selinux
  – Make sure you have this line:
  SELINUX=disabled
  – You will need to reboot to if you changed the SELINUX
    setting
• Launch the Cloudera Manager Installation:
Sudo chmod u+x ./cloudera-manager-installer.bin
sudo ./cloudera-manager-installer.bin
This one is Easy…
And this one as well…
What do you think about this one?
And yet another one…
It will soon be over 
And it starts rolling
Why it is important to avoid
     cleaning up your
      presentation…
OOPsss!
Here is why…(posgresql missing…)
After getting “Installation Failed” I got
this as well…then it exited to OS shell
Installing PostgreSQL
• rpm -ivh postgresql-8.1.23-
  1.el5_7.2.x86_64.rpm (CLIENT – not a must)
• rpm –ivh postgresql-server-8.1.23-
  1.el5_7.2.x86_64.rpm
Re-run installation
• ./cloudera-manager-installer.bin
Looking better now…
Hooray!
Continue Setup via the web…
Welcome…
You have to give something now…
  No such thing as free gifts 
Now enter your 2 or more Hadoop
          Node names
Give it some credentials…
Cool!
Here goes nothing…
Here is why it failed on the nodes…
Installing what’s missing on both
                  nodes
• rpm –ivh cyrus-sasl-gssapi-2.1.22-
  5.el5_4.3.x86_64.rpm
Do it, do it again
This bogus issue was resolved by
simple re-try. Looks like it fails due to
 internet access issues and does not
         accurately report it.
Yeh!
What’s on the Menu?
Files and Folders…
(Used the Defaults and both nodes
 had the same directory structure)
All systems are GO!
Here is our glorious Hadoop Cluster
Including all the services
How to start Hadooping – using its GUI
             option (HUE)
• Download the HUE user guide right here:
  https://ccp.cloudera.com/display/CDH4B2/Hu
  e+2.0+User+Guide
Syslog Action Time
Mapping and Analyzing Syslog
Give me some GUI Hue!
  Use hostname:8088
Wait a Minute…
• Expect undocumented issues if you do this:

•   HUE requires a special user (let’s say “admin”)
•   Tell HUE about it, the first time you use it
•   Add the user to the Unix system as well
•   Add the user to groups “hive” and “hadoop”
Starting the Data Import from File
Ready, Set, GO!
This results in a new “Query”
Let’s load it!
Use this directory
Done!
Let’s hit the road!
And we have a new table created!
Upload the data
Create a Select QUERY from our new
        table and Execute it
Monitor the log report as the query is
              executed
What a wonderful output! 

More Related Content

What's hot

The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem tableMohamed Magdy
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstackFramgia Vietnam
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafkaSzehon Ho
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with ChefMatt Ray
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulobusbey
 
Impala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDImpala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDMatthew Jacobs
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 

What's hot (20)

Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem table
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstack
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Impala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDImpala Resource Management - OUTDATED
Impala Resource Management - OUTDATED
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 

Similar to Instant hadoop of your own

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDataWorks Summit
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big dataJack (Yaakov) Bezalel
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureAnita Luthra
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 

Similar to Instant hadoop of your own (20)

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Instant hadoop of your own

  • 1. Instant Hadoop of your Own Created by Jack Bezalel Senior IT Architect As part of the CTE Mentorship Program CA Technologies
  • 2. What’s Hadoop all about? • OPPORTUNITY: We have access to amazingly valuable data (Social Media, Mobile, …) • Problem: Data is seldom UN-Structured • Relational and data warehouse MUST have Structured Data, so they are off the list • Hadoop = fast, reliable analysis of both structured data and complex data
  • 3. What’s in Hadoop? • Reliable data storage using the Hadoop Distributed File System (HDFS) • High-Performance parallel data processing using a technique called MapReduce.
  • 4. How does it scale so well? • Hadoop runs on a collection of commodity, shared-nothing servers • You can add or remove servers in a Hadoop cluster at will • The system detects and compensates for hardware or system problems on any server. (self-healing)
  • 5. Who uses Hadoop? • Originally developed and employed by Yahoo and Facebook • Hadoop is now widely used in – Finance – Technology – Telecom – media and entertainment – Government – research institutions and other markets with significant data.
  • 6. Why did we use Cloudera’s Hadoop kit? • Cloudera is an active contributor to the Hadoop project • Provides an enterprise-ready, commercial Distribution for Hadoop • Cloudera Distribution saves time by bundling and testing the most popular projects related to Hadoop into a single easier to use package
  • 7. The solution we tested is provided by Cloudera Free Edition • Automates the installation and configuration of CDH3 • Entire cluster (up to 50 nodes) • requiring only root SSH access to your cluster's machines • Download Here: https://ccp.cloudera.com/display/SUPPORT/Cl oudera+Manager+Free+Edition+Download
  • 8. Cloudera Manager Free Edition consists of: • A small self-executing Cloudera Manager installation program • Server and other packages in preparation for cluster host installation • Cloudera Manager wizard for automating CDH3 installation and configuration on the cluster • Cloudera Manager monitoring and configuring the cluster after installation is completed
  • 9. What does Cloudera Include - Flume • Flume — Reliable Data Mover • The primary use case – a logging system – gathers a set of log files on every machine – aggregates them to a centralized persistent store (such as HDFS)
  • 10. What does Cloudera Include - Sqoop • Sqoop — A tool that imports / exports data between relational databases and Hadoop clusters. • Using JDBC imports into a Hadoop HDFS • Generates Java classes that enable users to interpret the table's schema
  • 11. What does Cloudera Include - Hue • Hue — GUI to work with CDH • Web application
  • 12. What does Cloudera Include - Pig • Pig — Analyzes large amounts of data • Using Pig's query language called Pig Latin • Queries run distributed on a Hadoop cluster
  • 13. What does Cloudera Include - Hive • Hive — A powerful data warehousing APP • Enables access your data using Hive QL • Hive QL = language that is similar to SQL.
  • 14. What does Cloudera Include - HBase • HBase — Large-scale tabular storage • Using HDFS • Cloudera recommends installing HBase in a standalone mode before you try to run it on a whole cluster.
  • 15. What does Cloudera Include - ZooKeeper • Zookeeper — Service that provides coordination between distributed processes.
  • 16. What does Cloudera Include - Oozie • Oozie — A server-based workflow engine • Runs workflow jobs with actions that execute Hadoop jobs • A command line client is also available for Remote Management
  • 17. What does Cloudera Include – 3 last strangely named tools… • Whirr — Provides a fast way to run cloud services • Snappy — A compression/decompression library • Mahout — A machine-learning tool. By enabling you to build machine-learning libraries that are scalable to "reasonably large" datasets, it aims to make building intelligent applications easier and faster
  • 18. Setup Walkthrough • Use Redhat RH5.5+ (CentOS and others supported as well, we used RH5.7) • 64bit only • 3 VMs used: – Cloudera Manager – 2 Nodes to deploy Hadoop on
  • 19. About the Cloudera Manager Free Edition Installation Program • Automatically Installs the package repositories for Cloudera Manager and the Oracle (JDK) • Installs the Cloudera Manager Server • Installs and configures an embedded PostgreSQL database
  • 20. Download the CDH3 (Cloudera) Manager • http://archive.cloudera.com/cloudera- manager/installer/latest/cloudera-manager- installer.bin
  • 21. Set yum.conf with your proxy if exists • Add those lines to /etc/yum.conf in your first Redhat Hadoop node (example here) proxy=http://proxy.corp.com:80 proxy_username=username proxy_password=password
  • 22. Let the show begin! • Make sure Selinux is disabled, or this won’t work! – View file /etc/sysconfig/selinux – Make sure you have this line: SELINUX=disabled – You will need to reboot to if you changed the SELINUX setting • Launch the Cloudera Manager Installation: Sudo chmod u+x ./cloudera-manager-installer.bin sudo ./cloudera-manager-installer.bin
  • 23. This one is Easy…
  • 24. And this one as well…
  • 25. What do you think about this one?
  • 26. And yet another one…
  • 27. It will soon be over 
  • 28. And it starts rolling
  • 29. Why it is important to avoid cleaning up your presentation…
  • 32. After getting “Installation Failed” I got this as well…then it exited to OS shell
  • 33. Installing PostgreSQL • rpm -ivh postgresql-8.1.23- 1.el5_7.2.x86_64.rpm (CLIENT – not a must) • rpm –ivh postgresql-server-8.1.23- 1.el5_7.2.x86_64.rpm
  • 37. Continue Setup via the web…
  • 39. You have to give something now… No such thing as free gifts 
  • 40. Now enter your 2 or more Hadoop Node names
  • 41. Give it some credentials…
  • 42. Cool!
  • 44. Here is why it failed on the nodes…
  • 45. Installing what’s missing on both nodes • rpm –ivh cyrus-sasl-gssapi-2.1.22- 5.el5_4.3.x86_64.rpm
  • 46. Do it, do it again
  • 47. This bogus issue was resolved by simple re-try. Looks like it fails due to internet access issues and does not accurately report it.
  • 48. Yeh!
  • 50. Files and Folders… (Used the Defaults and both nodes had the same directory structure)
  • 52.
  • 53. Here is our glorious Hadoop Cluster
  • 54. Including all the services
  • 55. How to start Hadooping – using its GUI option (HUE) • Download the HUE user guide right here: https://ccp.cloudera.com/display/CDH4B2/Hu e+2.0+User+Guide
  • 56. Syslog Action Time Mapping and Analyzing Syslog
  • 57. Give me some GUI Hue! Use hostname:8088
  • 58. Wait a Minute… • Expect undocumented issues if you do this: • HUE requires a special user (let’s say “admin”) • Tell HUE about it, the first time you use it • Add the user to the Unix system as well • Add the user to groups “hive” and “hadoop”
  • 59. Starting the Data Import from File
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 67. This results in a new “Query”
  • 70.
  • 71.
  • 72. Done!
  • 74. And we have a new table created!
  • 76. Create a Select QUERY from our new table and Execute it
  • 77. Monitor the log report as the query is executed
  • 78. What a wonderful output! 