SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Profiling Hadoop Applications
Basant Verma
Agenda
• Profiling General Background
• Available Options
• Profile using Free and Open Source tools
• Profile using YourKit
• Other troubleshooting tools
What does Profiling Provide?
• Profiling runtime / CPU usage:
– what lines of code the program is spending the most
time in
– what call/invocation paths were used to get to these
lines
• naturally represented as tree structures
• Profiling memory usage:
– what kinds of objects are sitting on the heap
– where were they allocated
– who is pointing to them now
– memory leaks
Profiler Types and Components
• Components needed for profiling
– Profiling Agent
• Collects profiled data (samples, traces, exceptions etc.)
– Analysis Tool
• Provides interface for analyzing profiled data and help user
identify potential problems
• Types of Profilers
– insertion
– sampling
– instrumenting
Available Options
• Sun JDK Tools
– hprof: Profiler (uses jvmti)
– jmap: Provides memory map (dump) heap
– jhat: Analyze memory dump
– jstack: Provide thread dump
– Jvisualvm: GUI based profile data analyzer
• Open Source
– Visual VM (same as jvisualvm but downloaded as independent app)
• Uses HPROF internally for profiling. Provides GUI for analysis of heap dump and profiler outputs
– NetBeans Profiler
• Similar to VisualVM but integrated into IDE
– Eclipse MAT (Memory Analysis Tool)
• Can load .hprof files
• Commercial
– YourKit
– JProfile
USING HPROF
7
Official hprof Documentation
usage: java -Xrunhprof:[help]|[<option>=<value>, ...]
Option Name and Value Description Default
--------------------- ----------- -------
heap=dump|sites|all heap profiling all
cpu=samples|times|old CPU usage off
monitor=y|n monitor contention n
format=a|b text(txt) or binary output a
file=<file> write data to file off
depth=<size> stack trace depth 4
interval=<ms> sample interval in ms 10
cutoff=<value> output cutoff point 0.0001
lineno=y|n line number in traces? Y
thread=y|n thread in traces? N
doe=y|n dump on exit? Y
msa=y|n Solaris micro state accounting n
force=y|n force output to <file> y
verbose=y|n print messages about dumps y
http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
8
Sample hprof usage
• To measure CPU usage, try the following:
java -Xrunhprof:cpu=samples,depth=6,heap=dump
• Settings:
– Takes samples of CPU execution
– Record call traces that include the last 6 levels on the
stack
– Dumps the heap map (bigger file size but helps in
finding problems)
• Creates the file java.hprof.txt in the
current directory
HPROF with Hadoop
• Hadoop uses hprof as the default profiler
• Profiling related parameters
Purpose JobConf API Command line Parameter
Enable Profiling setProfileEnabled(true) mapred.task.profile=true
Additional
parameters for
Profiler
setProfileParams(…) mapred.task.profile.params
Range of sampled
task to profile
setProfileTaskRange mapred.task.profile.maps
mapred.task.profile.reduces
Example
• Using Java API
• Using Command line parameters
jobConf.setProfileEnabled(true);
jobConf.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites” +
“,depth=4,thread=y,file=%s");
jobConf.setProfileTaskRange(true, "0-2");
jobConf.setProfileTaskRange(false, "0-1");
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount 
-Dmapred.task.profile=true 
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,file=%s 
-Dmapred.task.profile.maps=0-2 
-Dmapred.task.profile.reduces=0-1 
input output
Collecting Profiler Output
• Hadoop JobClient automatically downloads profile logs
from all the profiled tasks
– If output format type is not specified, hprof creates profile
output in text format (format=a)
• Profiler Outputs are also available via History WebUI
• You can also download profile output using curl
– curl -o attempt_201305161037_0004_m_000000_0.hprof
"http://17.115.13.191:50060/tasklog?plaintext=true&attemptid=attempt_
201305161037_0004_m_000000_0&filter=profile"
Task User Log
Analyze Profiler output
• You can use VisualVM, NetBeans profiler or
YourKit for analyzing the profiling data.
– The above tools support only binary format of hprof
output (i.e. option format=b)
• Example
– Run profiler with Hadoop job
– Load Profiler output using VisualVM menu option
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount 
-Dmapred.task.profile=true 
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all,
depth=4,thread=y,format=b,file=%s 
input output
Analyze Profile Output in VisualVM
Object Query Language
• VisualVM and jhat support special query
language (OQL) to query Java heap.
– Example : Select all Strings with length 1K or more
• More information about OQL is available at
http://visualvm.java.net/oqlhelp.html
select s from java.lang.String where s.count > 1024;
Analyze Profile Output in Eclipse MAT
Profiling Pig Jobs
• Use Hadoop command line parameters
• More information about Pig job profiling is
available at Pig Wiki
– https://cwiki.apache.org/PIG/howtoprofile.html
pig -Dmapred.task.profile=true 
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=sites,thread=y,verbose=n

-Dmapred.task.profile.maps=0-2 
-Dmapred.task.profile.reduces=0-0 
mypigscript.pig
Profiling Hive Queries
• Set appropriate Hadoop parameters before
submitting the queries
hive> set mapred.task.profile=true;
hive> set mapred.task.profile.params=-agentlib:hprof=heap=dump,format=b,file=%s;
hive> set mapred.task.profile.maps=0-2;
hive> set mapred.task.profile.reduces=0-0;
hive>
hive> <hive query>
USING YOURKIT
YourKit Profiler - Summary
• Commercial Java Profiling Tool
– Free tryout and Open Source licenses are available
• Used by many Open Source projects including
Hadoop, Pig, Hive etc.
• Features
– On-Demand Profiling
– CPU, Memory and Concurrency profiling methods
– Has integration (Eclipse, NetBeans, IntelliJ)
– Above all, has relatively low performance overhead
Using YourKit Profiler
• You will need to install YourKit profiler (just the profiler
lib) on to each TaskTracker
• Tell Hadoop to use a different profiler
• Theoretically, you can also use DistributedCache to
make binaries available on TaskTracker machines
– Though, I did not have success with this
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount 
-Dmapred.task.profile=true 
-Dmapred.task.profile.params=-
agentpath:<yourkit_path>/libyjpagent.jnilib=dir=/tmp/yourkit_snapnshot,sampling,disablej2ee 
-Dmapred.task.profile.maps=0-2 
-Dmapred.task.profile.reduces=0-1 
input output
Small Glitch
• Hadoop JobClient.waitforCompletion(…) will throw error since profile logs
are not available in the default directory.
• However, the job will continue to run successfully.
• To avoid this, you can instead use mapred.child.java.opts option to specify
the profiling parameters
YourKit to Analyze Jobs
• Can analyze profile output from both YourKit
Profiler and hprof/jmap.
OTHER TOOLS
Using other Tools
• JDK Tool ‘jmap’
– Can be used for capturing heap map of a running Java
process and later used for analysis inside VisualVM or
YourKit
• $ jmap -dump:live,format=b,file=xyz.hprof <jvm-pid>
• Don’t run jmap with -histo:live option on JT or NN
– Java process can also be instructed to generate hprof
dump of heap map in case of OutOfMemoryError
• -XX:+HeapDumpOnOutOfMemoryError
• JDK Tool ‘jhat’
– Can read heap dump in hprof format and provides a
light weight web interface to analyze profiler output
Other Tools (Cont…)
• Hadoop Vaidya (Simple Diagnostic Tool)
– Identifies common performance problem related
to Hadoop Jobs (unbalanced partitioning,
granularity of tasks, combiners etc.)
– Works merely on Hadoop Job (does not
understands the specifics of Hive/Pig)
Other Recommendation
• If possible try running Hadoop (MR/Pig/Hive)
in local mode using LocalJobRunner
– LocalJobRunner runs the entire MapReduce job in
a single JVM
– It simplifies profiling and log collection
– Can also be used for attaching debugger from IDE
Resources
• Troubleshooting Java application
– http://www.oracle.com/technetwork/java/javase/toc-135973.html
• Profile Hadoop Job (Chapter 5 - “Hadoop – The definitive Guide”)
– http://my.safaribooksonline.com/book/databases/hadoop/978059652
1974/tuning-a-job/id3545664
• Profiling Pig Job
– https://cwiki.apache.org/PIG/howtoprofile.html
• ‘hprof’ Official Documentation
– http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
• YourKit Profiler
– http://www.yourkit.com

Contenu connexe

Tendances

Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Sigmoid
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 

Tendances (20)

Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 

En vedette

Green Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and EngineeringGreen Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and Engineeringdigitallibrary
 
Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur Floyd Arthur
 
Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter Pet Sitters International
 
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...CA Technologies
 
QBE Workers Compensation training
QBE Workers Compensation trainingQBE Workers Compensation training
QBE Workers Compensation trainingLinda Hunter
 

En vedette (6)

Green Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and EngineeringGreen Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and Engineering
 
Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur
 
Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter
 
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
 
Apartment buildings insurance
Apartment buildings insuranceApartment buildings insurance
Apartment buildings insurance
 
QBE Workers Compensation training
QBE Workers Compensation trainingQBE Workers Compensation training
QBE Workers Compensation training
 

Similaire à Profile hadoop apps

Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformWangda Tan
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018Fabio Janiszevski
 
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Quang Ngoc
 
Java SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).pptJava SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).pptAayush Chimaniya
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profilerIhor Bobak
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3Borni DHIFI
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JJosh Patterson
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedwhoschek
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination ExtRohit Kelapure
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTTkevinvw
 
Debugging Java from Dumps
Debugging Java from DumpsDebugging Java from Dumps
Debugging Java from DumpsChris Bailey
 
1 java programming- introduction
1  java programming- introduction1  java programming- introduction
1 java programming- introductionjyoti_lakhani
 

Similaire à Profile hadoop apps (20)

Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018
 
DIY Java Profiling
DIY Java ProfilingDIY Java Profiling
DIY Java Profiling
 
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0
 
Java SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).pptJava SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).ppt
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
PHP Profiling/performance
PHP Profiling/performancePHP Profiling/performance
PHP Profiling/performance
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3
 
Java >= 9
Java >= 9Java >= 9
Java >= 9
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTT
 
Where is the bottleneck
Where is the bottleneckWhere is the bottleneck
Where is the bottleneck
 
Debugging Java from Dumps
Debugging Java from DumpsDebugging Java from Dumps
Debugging Java from Dumps
 
1 java programming- introduction
1  java programming- introduction1  java programming- introduction
1 java programming- introduction
 

Dernier

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 

Dernier (20)

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 

Profile hadoop apps

  • 2. Agenda • Profiling General Background • Available Options • Profile using Free and Open Source tools • Profile using YourKit • Other troubleshooting tools
  • 3. What does Profiling Provide? • Profiling runtime / CPU usage: – what lines of code the program is spending the most time in – what call/invocation paths were used to get to these lines • naturally represented as tree structures • Profiling memory usage: – what kinds of objects are sitting on the heap – where were they allocated – who is pointing to them now – memory leaks
  • 4. Profiler Types and Components • Components needed for profiling – Profiling Agent • Collects profiled data (samples, traces, exceptions etc.) – Analysis Tool • Provides interface for analyzing profiled data and help user identify potential problems • Types of Profilers – insertion – sampling – instrumenting
  • 5. Available Options • Sun JDK Tools – hprof: Profiler (uses jvmti) – jmap: Provides memory map (dump) heap – jhat: Analyze memory dump – jstack: Provide thread dump – Jvisualvm: GUI based profile data analyzer • Open Source – Visual VM (same as jvisualvm but downloaded as independent app) • Uses HPROF internally for profiling. Provides GUI for analysis of heap dump and profiler outputs – NetBeans Profiler • Similar to VisualVM but integrated into IDE – Eclipse MAT (Memory Analysis Tool) • Can load .hprof files • Commercial – YourKit – JProfile
  • 7. 7 Official hprof Documentation usage: java -Xrunhprof:[help]|[<option>=<value>, ...] Option Name and Value Description Default --------------------- ----------- ------- heap=dump|sites|all heap profiling all cpu=samples|times|old CPU usage off monitor=y|n monitor contention n format=a|b text(txt) or binary output a file=<file> write data to file off depth=<size> stack trace depth 4 interval=<ms> sample interval in ms 10 cutoff=<value> output cutoff point 0.0001 lineno=y|n line number in traces? Y thread=y|n thread in traces? N doe=y|n dump on exit? Y msa=y|n Solaris micro state accounting n force=y|n force output to <file> y verbose=y|n print messages about dumps y http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
  • 8. 8 Sample hprof usage • To measure CPU usage, try the following: java -Xrunhprof:cpu=samples,depth=6,heap=dump • Settings: – Takes samples of CPU execution – Record call traces that include the last 6 levels on the stack – Dumps the heap map (bigger file size but helps in finding problems) • Creates the file java.hprof.txt in the current directory
  • 9. HPROF with Hadoop • Hadoop uses hprof as the default profiler • Profiling related parameters Purpose JobConf API Command line Parameter Enable Profiling setProfileEnabled(true) mapred.task.profile=true Additional parameters for Profiler setProfileParams(…) mapred.task.profile.params Range of sampled task to profile setProfileTaskRange mapred.task.profile.maps mapred.task.profile.reduces
  • 10. Example • Using Java API • Using Command line parameters jobConf.setProfileEnabled(true); jobConf.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites” + “,depth=4,thread=y,file=%s"); jobConf.setProfileTaskRange(true, "0-2"); jobConf.setProfileTaskRange(false, "0-1"); hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,file=%s -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-1 input output
  • 11. Collecting Profiler Output • Hadoop JobClient automatically downloads profile logs from all the profiled tasks – If output format type is not specified, hprof creates profile output in text format (format=a) • Profiler Outputs are also available via History WebUI • You can also download profile output using curl – curl -o attempt_201305161037_0004_m_000000_0.hprof "http://17.115.13.191:50060/tasklog?plaintext=true&attemptid=attempt_ 201305161037_0004_m_000000_0&filter=profile"
  • 13. Analyze Profiler output • You can use VisualVM, NetBeans profiler or YourKit for analyzing the profiling data. – The above tools support only binary format of hprof output (i.e. option format=b) • Example – Run profiler with Hadoop job – Load Profiler output using VisualVM menu option hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,format=b,file=%s input output
  • 14. Analyze Profile Output in VisualVM
  • 15. Object Query Language • VisualVM and jhat support special query language (OQL) to query Java heap. – Example : Select all Strings with length 1K or more • More information about OQL is available at http://visualvm.java.net/oqlhelp.html select s from java.lang.String where s.count > 1024;
  • 16. Analyze Profile Output in Eclipse MAT
  • 17. Profiling Pig Jobs • Use Hadoop command line parameters • More information about Pig job profiling is available at Pig Wiki – https://cwiki.apache.org/PIG/howtoprofile.html pig -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=sites,thread=y,verbose=n -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-0 mypigscript.pig
  • 18. Profiling Hive Queries • Set appropriate Hadoop parameters before submitting the queries hive> set mapred.task.profile=true; hive> set mapred.task.profile.params=-agentlib:hprof=heap=dump,format=b,file=%s; hive> set mapred.task.profile.maps=0-2; hive> set mapred.task.profile.reduces=0-0; hive> hive> <hive query>
  • 20. YourKit Profiler - Summary • Commercial Java Profiling Tool – Free tryout and Open Source licenses are available • Used by many Open Source projects including Hadoop, Pig, Hive etc. • Features – On-Demand Profiling – CPU, Memory and Concurrency profiling methods – Has integration (Eclipse, NetBeans, IntelliJ) – Above all, has relatively low performance overhead
  • 21. Using YourKit Profiler • You will need to install YourKit profiler (just the profiler lib) on to each TaskTracker • Tell Hadoop to use a different profiler • Theoretically, you can also use DistributedCache to make binaries available on TaskTracker machines – Though, I did not have success with this hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=- agentpath:<yourkit_path>/libyjpagent.jnilib=dir=/tmp/yourkit_snapnshot,sampling,disablej2ee -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-1 input output
  • 22. Small Glitch • Hadoop JobClient.waitforCompletion(…) will throw error since profile logs are not available in the default directory. • However, the job will continue to run successfully. • To avoid this, you can instead use mapred.child.java.opts option to specify the profiling parameters
  • 23. YourKit to Analyze Jobs • Can analyze profile output from both YourKit Profiler and hprof/jmap.
  • 25. Using other Tools • JDK Tool ‘jmap’ – Can be used for capturing heap map of a running Java process and later used for analysis inside VisualVM or YourKit • $ jmap -dump:live,format=b,file=xyz.hprof <jvm-pid> • Don’t run jmap with -histo:live option on JT or NN – Java process can also be instructed to generate hprof dump of heap map in case of OutOfMemoryError • -XX:+HeapDumpOnOutOfMemoryError • JDK Tool ‘jhat’ – Can read heap dump in hprof format and provides a light weight web interface to analyze profiler output
  • 26. Other Tools (Cont…) • Hadoop Vaidya (Simple Diagnostic Tool) – Identifies common performance problem related to Hadoop Jobs (unbalanced partitioning, granularity of tasks, combiners etc.) – Works merely on Hadoop Job (does not understands the specifics of Hive/Pig)
  • 27. Other Recommendation • If possible try running Hadoop (MR/Pig/Hive) in local mode using LocalJobRunner – LocalJobRunner runs the entire MapReduce job in a single JVM – It simplifies profiling and log collection – Can also be used for attaching debugger from IDE
  • 28. Resources • Troubleshooting Java application – http://www.oracle.com/technetwork/java/javase/toc-135973.html • Profile Hadoop Job (Chapter 5 - “Hadoop – The definitive Guide”) – http://my.safaribooksonline.com/book/databases/hadoop/978059652 1974/tuning-a-job/id3545664 • Profiling Pig Job – https://cwiki.apache.org/PIG/howtoprofile.html • ‘hprof’ Official Documentation – http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html • YourKit Profiler – http://www.yourkit.com