SlideShare une entreprise Scribd logo
1  sur  23
1
Optimizer
September 3, 2014
By Airray
• Overview of Starfish
• Use Starfish to ?
• Starfish Installation
• Example
• DEMO
2
Online
Overview of Starfish
Profiling Data
Hadoop logs
MR run times
Ganglia metrics
Profile Store
Filesystem
Starfish
Interface
Use Starfish to ?
• Visualize understand what’s happening
• Optimize speed up performance
• Strategize size requirements intelligently
Visualize
• See how MapReduce apps are performing
• Understand bottlenecks in Hadoop
• Find misconfigured Hadoop parameters
• Learn to develop better MapReduce apps
Optimize
• Tune hadoop easily with automatic health check and
recommendations
• Find optimal parameter settings for MapReduce
applications in Java, streaming, Hive ,Pig ,and other
languages
Strategize
• Make intelligent resource allocation choices for
Hadoop
• Find optimal EC2 instances for workloads
• Meet time and cost budgets with ease
• If you have downloaded the Starfish Binaries,
simply extract the package in a directory of your
choice.
2
Starfish Installation Instructions
tar -xzf starfish-0.3.0.tar.gz
In order to profile the execution of a Map-Reduce job in a Hadoop
cluster, you must first install the pre-compiled BTrace scripts and jars
(included in Starfish).
1.Set the following global profiling parameter in bin/config.sh:
•SLAVES_BTRACE_DIR: BTrace installation directory at the slave
nodes. Please specify the full path and ensure you have the
appropriate write permissions. The path will be created if it doesn't
exist.
•CLUSTER_NAME: A descriptive name for the cluster (like test,
production, etc.). Do not include spaces or special characters in the
name.
•PROFILER_OUTPUT_DIR: The local directory to place the collected
logs and profile files. Please specify the full path and ensure you
have the appropriate write permissions. The path will be created if
it doesn't exist.
2
BTrace Installation Instructions(1)
BTrace Installation Instructions(2)
Vi /starfish-0.3.0/bin/config.sh
# The btrace install directory on the slave machines
# Specify a FULL path! This setting is required!
# Example: SLAVES_BTRACE_DIR=/root/btrace
SLAVES_BTRACE_DIR=/opt/btrace
# A descriptive name for the cluster, like test, production, etc.
# No spaces or special characters in the name. This setting is
required!
CLUSTER_NAME=etu
# The local directory to place the output files
# If left blank, it defaults to the working directory (not recommended)
PROFILER_OUTPUT_DIR=/opt/btrace
Install BTrace using the
provided bin/install_btrace.sh from the master
node in the cluster. The sole input to the script is the
path to a file containing the names or ip addresses of
the slave nodes in the cluster.
2
BTrace Installation Instructions(2)
./bin/install_btrace.sh slave.txt
2
Profile
./bin/profile hadoop jar
contrib/examples/hadoop-starfish-examples.jar
wordcount -Dmapred.reduce.tasks=10
/input/path /output/path
The expected parameters are identical to the parameters required
by ${HADOOP_HOME}/bin/hadoop. Here is an example for
profiling a WordCount MapReduce program.
Job Analysis(1)
Analyzing a MapReduce Job with
the Visualizer
Job Analysis(2)
14
Job Analysis(3)
15
What-if Analysis
What-if Analysis
./bin/whatif details job_2010030839_0000 hadoop jar 
contrib/examples/hadoop-starfish-examples.jar
wordcount  -Dmapred.reduce.tasks=20
/input/path /output/path
Job Optimization
Job Optimization on a Live Hadoop Cluster
./bin/optimize mode job_id hadoop jar jarFile args...
Print on the console the configuration parameter settings suggested by the Cost-based
Optimizer for a WordCount MapReduce job.
./bin/optimize recommend job_2010030839_0000 hadoop jar 
contrib/examples/hadoop-starfish-examples.jar wordcount 
/input/path /output/path
Execute a WordCount MapReduce job using the configuration parameter settings
automatically suggested by the Cost-based Optimizer.
./bin/optimize run job_2010030839_0000 hadoop jar 
contrib/examples/hadoop-starfish-examples.jar wordcount 
/input/path /output/path
Starfish-example
19
Starfish-example
20
Starfish-example
21
DEMO
318, Rueiguang Rd., Taipei 114,
Taiwan
T: +886 2 7720 1888
F: +886 2 8798 6069
www.etusolution.com
Contact

Contenu connexe

Tendances

Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an exampleNikita Kesharwani
 
Hadoop single cluster installation
Hadoop single cluster installationHadoop single cluster installation
Hadoop single cluster installationMinh Tran
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
DSpace Installation
DSpace InstallationDSpace Installation
DSpace InstallationNur Ahammad
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMJeffrey Breen
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentationpuneet yadav
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradationShashwat Shriparv
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Django deployment best practices
Django deployment best practicesDjango deployment best practices
Django deployment best practicesErik LaBianca
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single clusterSalil Navgire
 
HADOOP ONLINE TRAINING
HADOOP ONLINE TRAININGHADOOP ONLINE TRAINING
HADOOP ONLINE TRAININGSanthosh Sap
 

Tendances (16)

Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
Hadoop single cluster installation
Hadoop single cluster installationHadoop single cluster installation
Hadoop single cluster installation
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
DSpace Installation
DSpace InstallationDSpace Installation
DSpace Installation
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VM
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Hadoop Installation
Hadoop InstallationHadoop Installation
Hadoop Installation
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradation
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Django deployment best practices
Django deployment best practicesDjango deployment best practices
Django deployment best practices
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
HADOOP ONLINE TRAINING
HADOOP ONLINE TRAININGHADOOP ONLINE TRAINING
HADOOP ONLINE TRAINING
 

En vedette

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in HiveDataWorks Summit
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizationsSzehon Ho
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinityShashwat Shriparv
 
Hive Performance Monitoring Tool
Hive Performance Monitoring ToolHive Performance Monitoring Tool
Hive Performance Monitoring ToolJunHo Cho
 

En vedette (6)

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 
Nigel Tebbutt Profile - Fin Tech PDF
Nigel Tebbutt Profile - Fin Tech PDFNigel Tebbutt Profile - Fin Tech PDF
Nigel Tebbutt Profile - Fin Tech PDF
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinity
 
Hive Performance Monitoring Tool
Hive Performance Monitoring ToolHive Performance Monitoring Tool
Hive Performance Monitoring Tool
 

Similaire à Etu_Optimizer

Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?Katherine Golovinova
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Apache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On Linux
Apache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On LinuxApache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On Linux
Apache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On LinuxAnant Corporation
 
DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee Nur Ahammad
 
Infrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformInfrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformAdin Ermie
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabricandymccurdy
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...NETWAYS
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALIDNur Ahammad
 
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Patrick Chanezon
 
Karl Grzeszczak: September Docker Presentation at Mediafly
Karl Grzeszczak: September Docker Presentation at MediaflyKarl Grzeszczak: September Docker Presentation at Mediafly
Karl Grzeszczak: September Docker Presentation at MediaflyMediafly
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzureDevoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzurePatrick Chanezon
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariAlejandro Fernandez
 

Similaire à Etu_Optimizer (20)

Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Apache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On Linux
Apache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On LinuxApache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On Linux
Apache Cassandra Lunch #63: How to Install Cassandra 4.0 From a Tarball On Linux
 
DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee
 
Infrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformInfrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using Terraform
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
Hadoop 설치
Hadoop 설치Hadoop 설치
Hadoop 설치
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALID
 
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
 
Karl Grzeszczak: September Docker Presentation at Mediafly
Karl Grzeszczak: September Docker Presentation at MediaflyKarl Grzeszczak: September Docker Presentation at Mediafly
Karl Grzeszczak: September Docker Presentation at Mediafly
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzureDevoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
 
Terraform training 🎒 - Basic
Terraform training 🎒 - BasicTerraform training 🎒 - Basic
Terraform training 🎒 - Basic
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 

Etu_Optimizer

  • 2. • Overview of Starfish • Use Starfish to ? • Starfish Installation • Example • DEMO 2 Online
  • 3. Overview of Starfish Profiling Data Hadoop logs MR run times Ganglia metrics Profile Store Filesystem Starfish Interface
  • 4. Use Starfish to ? • Visualize understand what’s happening • Optimize speed up performance • Strategize size requirements intelligently
  • 5. Visualize • See how MapReduce apps are performing • Understand bottlenecks in Hadoop • Find misconfigured Hadoop parameters • Learn to develop better MapReduce apps
  • 6. Optimize • Tune hadoop easily with automatic health check and recommendations • Find optimal parameter settings for MapReduce applications in Java, streaming, Hive ,Pig ,and other languages
  • 7. Strategize • Make intelligent resource allocation choices for Hadoop • Find optimal EC2 instances for workloads • Meet time and cost budgets with ease
  • 8. • If you have downloaded the Starfish Binaries, simply extract the package in a directory of your choice. 2 Starfish Installation Instructions tar -xzf starfish-0.3.0.tar.gz
  • 9. In order to profile the execution of a Map-Reduce job in a Hadoop cluster, you must first install the pre-compiled BTrace scripts and jars (included in Starfish). 1.Set the following global profiling parameter in bin/config.sh: •SLAVES_BTRACE_DIR: BTrace installation directory at the slave nodes. Please specify the full path and ensure you have the appropriate write permissions. The path will be created if it doesn't exist. •CLUSTER_NAME: A descriptive name for the cluster (like test, production, etc.). Do not include spaces or special characters in the name. •PROFILER_OUTPUT_DIR: The local directory to place the collected logs and profile files. Please specify the full path and ensure you have the appropriate write permissions. The path will be created if it doesn't exist. 2 BTrace Installation Instructions(1)
  • 10. BTrace Installation Instructions(2) Vi /starfish-0.3.0/bin/config.sh # The btrace install directory on the slave machines # Specify a FULL path! This setting is required! # Example: SLAVES_BTRACE_DIR=/root/btrace SLAVES_BTRACE_DIR=/opt/btrace # A descriptive name for the cluster, like test, production, etc. # No spaces or special characters in the name. This setting is required! CLUSTER_NAME=etu # The local directory to place the output files # If left blank, it defaults to the working directory (not recommended) PROFILER_OUTPUT_DIR=/opt/btrace
  • 11. Install BTrace using the provided bin/install_btrace.sh from the master node in the cluster. The sole input to the script is the path to a file containing the names or ip addresses of the slave nodes in the cluster. 2 BTrace Installation Instructions(2) ./bin/install_btrace.sh slave.txt
  • 12. 2 Profile ./bin/profile hadoop jar contrib/examples/hadoop-starfish-examples.jar wordcount -Dmapred.reduce.tasks=10 /input/path /output/path The expected parameters are identical to the parameters required by ${HADOOP_HOME}/bin/hadoop. Here is an example for profiling a WordCount MapReduce program.
  • 13. Job Analysis(1) Analyzing a MapReduce Job with the Visualizer
  • 17. What-if Analysis ./bin/whatif details job_2010030839_0000 hadoop jar contrib/examples/hadoop-starfish-examples.jar wordcount -Dmapred.reduce.tasks=20 /input/path /output/path
  • 18. Job Optimization Job Optimization on a Live Hadoop Cluster ./bin/optimize mode job_id hadoop jar jarFile args... Print on the console the configuration parameter settings suggested by the Cost-based Optimizer for a WordCount MapReduce job. ./bin/optimize recommend job_2010030839_0000 hadoop jar contrib/examples/hadoop-starfish-examples.jar wordcount /input/path /output/path Execute a WordCount MapReduce job using the configuration parameter settings automatically suggested by the Cost-based Optimizer. ./bin/optimize run job_2010030839_0000 hadoop jar contrib/examples/hadoop-starfish-examples.jar wordcount /input/path /output/path
  • 22. DEMO
  • 23. 318, Rueiguang Rd., Taipei 114, Taiwan T: +886 2 7720 1888 F: +886 2 8798 6069 www.etusolution.com Contact

Notes de l'éditeur

  1. Use Starfish to
  2. The expected parameters are identical to the parameters required by ${HADOOP_HOME}/bin/hadoop. Here is an example for profiling a WordCount MapReduce program.