SlideShare une entreprise Scribd logo
1  sur  10
STAT: A Debugging Tool
                 For Extreme Scale


                        Martin Schulz
           Center for Applied Scientific Computing
           Lawrence Livermore National Laboratory
ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
        Developed at LLNL, University of Wisconsin &
                  University of New Mexico
         Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551

            This work performed under the auspices of the U.S. Department of Energy by
            Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

                                                                                         LLNL-PRES-426152
STAT: Debugging Support at Scale
  The debugging challenge at scale
    • Traditional debuggers break down at scale
    • Data and control for too many tasks
    • Sequential paradigm
  How can STAT help?
    • Identify equivalence classes
    • Pre-analysis for subset debugging
  Typical use case
    • Application hang (life or dead-lock)
    • Answer the question: What is my code doing now?


Lawrence Livermore National Laboratory
Stacktraces: The Basis for STAT




Lawrence Livermore National Laboratory
Gathering Stack Traces
  STAT gathers stack traces from
    • Multiple processes
    • Multiple samples per process




            3D 2D Trace/Space Call Graph Prefix Tree
               Trace/Space/Time Call Graph Prefix Tree

   MPI                           MPI            MPI


Lawrence Livermore National Laboratory
Interpreting Stacktrace Trees




Task 0              Task 1               Task 2



      Your Favorite Debugger




Lawrence Livermore National Laboratory
STAT GUI




Lawrence Livermore National Laboratory
Availability
Platform           Ver.      Usage             Documentation                           POC
LLNL/TLCC          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
OCF                          STAT                                                      lee218@llnl.gov
LLNL/TLCC          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
SCF                          STAT                                                      lee218@llnl.gov
LLNL/uBGL          0.9.0     STAT              https://computing.llnl.gov/code/STAT/   Greg Lee
                   beta                                                                lee218@llnl.gov
LLNL/Dawn          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
                   beta      STAT                                                      lee218@llnl.gov
SNL/Glory          0.9.2     see below         https://computing.llnl.gov/code/STAT/   Mahesh Rajan
                                                                                       mrajan@sandia.gov
LANL/Yellow        0.9.1b     Mod: hpc-tools   man stat                                consult@lanl.gov
Turing                        Mod: stat
LANL/Turquoise     0.9.2      Mod: hpc-tools   man stat                                consult@lanl.gov
Lobo                          Mod: stat



Usage for SNL/Glory:                                             Note: Red Storm has a poor-man STAT-like
module switch mpi mpi/mvapich-1.1_intel-11.1-f064-c064           utility called fast_where.
module load /home/jgalaro/privatemodules/openss-mvapich          Try "man fast_where” for usage instructions.

  Lawrence Livermore National Laboratory
Usage Instructions
  Option 1: Graphical User Interface
    • Launch GUI: STATGUI
    • Attach, create stacktraces & views through GUI
  Option 2: Command line
    • STAT <MPI launcher pid>
       − -t: number of traces
       − -T: time between traces
    • Reports output file to stdout
    • STATview <output file>
  Additional information
    • man STAT / STAT –h
    • acroread /usr/local/tools/stat/doc/*.pdf
Lawrence Livermore National Laboratory
Advanced Topics
  Scalable Implementation                               FE

    • Tree-based overlay networks
       − Data aggregation on the fly               CP         CP

       − Tree depth configurable
                                              CP                   CP
    • Parameters to STAT
    • Useful for 10,000+ tasks           BE         BE
                                                         …    BE        BE

  Temporal Analysis
    • Finer grain analysis of process location
    • Disambiguation of iteration instances
    • Employs static analysis to determine loop variables

Lawrence Livermore National Laboratory
Reference & Demo Session
  Usage documentation
    • https://computing.llnl.gov/code/STAT/
  Man page
    • man STAT or man STATGUI
    • STAT -h
  Background information
    • http://www.paradyn.org/STAT/STAT.html

  Demo Session / Track 3




Lawrence Livermore National Laboratory

Contenu connexe

Similaire à Lee.stat

Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
Databricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sion
Ccie Light
 

Similaire à Lee.stat (20)

Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuffBuildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sion
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Rob Davidson: Using Galaxy for Metabolomics
Rob Davidson: Using Galaxy for MetabolomicsRob Davidson: Using Galaxy for Metabolomics
Rob Davidson: Using Galaxy for Metabolomics
 
Metabolomics in Galaxy - ICG8 Shenzhen 2013
Metabolomics in Galaxy - ICG8 Shenzhen 2013Metabolomics in Galaxy - ICG8 Shenzhen 2013
Metabolomics in Galaxy - ICG8 Shenzhen 2013
 
Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data Logistics
 

Plus de ابراهيم العناني (15)

أهمية اللعب عند الأطفال
أهمية اللعب عند الأطفالأهمية اللعب عند الأطفال
أهمية اللعب عند الأطفال
 
The balance sheet
The balance sheetThe balance sheet
The balance sheet
 
Symcgoodman
SymcgoodmanSymcgoodman
Symcgoodman
 
Probability
ProbabilityProbability
Probability
 
Income statements
Income statementsIncome statements
Income statements
 
Healthy eating sc
Healthy eating scHealthy eating sc
Healthy eating sc
 
Foods
FoodsFoods
Foods
 
Fast food
Fast foodFast food
Fast food
 
Child psychology
Child psychologyChild psychology
Child psychology
 
Chapter3
Chapter3Chapter3
Chapter3
 
Ch16 introto business
Ch16 introto businessCh16 introto business
Ch16 introto business
 
Caiib fmmodbbsa nov08
Caiib fmmodbbsa nov08Caiib fmmodbbsa nov08
Caiib fmmodbbsa nov08
 
Accounting 1
Accounting 1Accounting 1
Accounting 1
 
Balance sheet
Balance sheetBalance sheet
Balance sheet
 
تعريف اللعب
تعريف اللعبتعريف اللعب
تعريف اللعب
 

Dernier

Dernier (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Lee.stat

  • 1. STAT: A Debugging Tool For Extreme Scale Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL) Developed at LLNL, University of Wisconsin & University of New Mexico Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 LLNL-PRES-426152
  • 2. STAT: Debugging Support at Scale  The debugging challenge at scale • Traditional debuggers break down at scale • Data and control for too many tasks • Sequential paradigm  How can STAT help? • Identify equivalence classes • Pre-analysis for subset debugging  Typical use case • Application hang (life or dead-lock) • Answer the question: What is my code doing now? Lawrence Livermore National Laboratory
  • 3. Stacktraces: The Basis for STAT Lawrence Livermore National Laboratory
  • 4. Gathering Stack Traces  STAT gathers stack traces from • Multiple processes • Multiple samples per process 3D 2D Trace/Space Call Graph Prefix Tree Trace/Space/Time Call Graph Prefix Tree MPI MPI MPI Lawrence Livermore National Laboratory
  • 5. Interpreting Stacktrace Trees Task 0 Task 1 Task 2 Your Favorite Debugger Lawrence Livermore National Laboratory
  • 6. STAT GUI Lawrence Livermore National Laboratory
  • 7. Availability Platform Ver. Usage Documentation POC LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee OCF STAT lee218@llnl.gov LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee SCF STAT lee218@llnl.gov LLNL/uBGL 0.9.0 STAT https://computing.llnl.gov/code/STAT/ Greg Lee beta lee218@llnl.gov LLNL/Dawn 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee beta STAT lee218@llnl.gov SNL/Glory 0.9.2 see below https://computing.llnl.gov/code/STAT/ Mahesh Rajan mrajan@sandia.gov LANL/Yellow 0.9.1b Mod: hpc-tools man stat consult@lanl.gov Turing Mod: stat LANL/Turquoise 0.9.2 Mod: hpc-tools man stat consult@lanl.gov Lobo Mod: stat Usage for SNL/Glory: Note: Red Storm has a poor-man STAT-like module switch mpi mpi/mvapich-1.1_intel-11.1-f064-c064 utility called fast_where. module load /home/jgalaro/privatemodules/openss-mvapich Try "man fast_where” for usage instructions. Lawrence Livermore National Laboratory
  • 8. Usage Instructions  Option 1: Graphical User Interface • Launch GUI: STATGUI • Attach, create stacktraces & views through GUI  Option 2: Command line • STAT <MPI launcher pid> − -t: number of traces − -T: time between traces • Reports output file to stdout • STATview <output file>  Additional information • man STAT / STAT –h • acroread /usr/local/tools/stat/doc/*.pdf Lawrence Livermore National Laboratory
  • 9. Advanced Topics  Scalable Implementation FE • Tree-based overlay networks − Data aggregation on the fly CP CP − Tree depth configurable CP CP • Parameters to STAT • Useful for 10,000+ tasks BE BE … BE BE  Temporal Analysis • Finer grain analysis of process location • Disambiguation of iteration instances • Employs static analysis to determine loop variables Lawrence Livermore National Laboratory
  • 10. Reference & Demo Session  Usage documentation • https://computing.llnl.gov/code/STAT/  Man page • man STAT or man STATGUI • STAT -h  Background information • http://www.paradyn.org/STAT/STAT.html  Demo Session / Track 3 Lawrence Livermore National Laboratory