SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Instrumenting a
benchmark
application
Tools and Measurements Techniques
Project by Mário Almeida (EMDC)

Barcelona, 25 April 2012
Index (1/2)
Tools and configuration
● Parsec
  ○ Overview
  ○ Benchmark programs
● Extrae
● Paraver
● Configuration




                          1
Index (2/2)
Measurements
● Raytrace
  ○ Overview
  ○ Code
  ○ Inputs
  ○ Traces
  ○ Load Balancing
  ○ Cache misses and instructions
  ○ Execution time
  ○ Configuration comparisons
  ○ Extrae overhead
Conclusions                         2
Tools and configuration
Parsec
Overview
● Benchmark with the following characteristics:
  ○   Multithreaded
  ○   Emerging workloads
  ○   Diverse
  ○   Not HPC-focused
  ○   Research




                                             3
Parsec
Benchmark programs
●   blackscholes
●   bodytrack
●   canneal
●   dedup
●   facesim
●   ferret
●   fluidanimate
●   freqmine
●   raytrace
●   ...              4
Extrae
● Instrumentation package to trace programs
  and run with shared memory model and
  message passing programming.




                                              5
Paraver
● Detailed quantitative analysis of a program
  performance.
● Concurrent comparative analysis of several
  traces.
● Support for mixed message passing and
  shared memory.
● Building of derived metrics.


                                                6
Configuration (1/4)
Boada server:
●   Dual CPU Six Core with Hyperthreading.
●   Kills applications after a few minutes.
●   24 GB of RAM.


Boada server:
●   Used cpulimit to limit the cpu usage up to four cores.




                                                             7
Configuration (2/4)
Installed and/or configured:
●   Parsec 2.1 with raytrace package only.
●   Extrae 2.2.1.
●   Paraver 4.3.0 (in my laptop).
●   CpuLimit
●   Minor configurations on .bashrc.
●   Multiple scripts to clean, build and run.




                                                8
Configuration (3/4)




                      9
Configuration (4/4)




                      10
Measurements
Raytrace
Overview
● Physical simulation for visualization
● Computer animation
● Input is a complex object of many triangles.




                                             11
Raytrace
Code
For every pixel in the image
   calculate trajectory of ray striking pixel
   find closest intersection point of ray with scene
geometry
   calculate contribution of all lights at intersection point
   recursively trace specularly reflected ray
end for




                                                                12
Raytrace
Inputs
●   simsmall - 1 million polygons (480x270)
●   simmedium - 1 million poly (960x540)
●   simlarge - 1 million poly (1920x1080)
●   native - 10 million poly (1920x1080)




                                              13
Raytrace
Trace (1/2)
Only 10% of the execution time is parallel!




    Not created   Running


                                              14
Raytrace
Trace (2/2)
Render time is proportional to the # of frames!
     Init and adding object   Build Context   Render




                                                       15
Raytrace
Load balancing (1/2)




Not created             Create Threads    Task

              Barrier                    Wait for all threads   16
Raytrace
Load balancing (2/2)
Good load balancing between the slave
threads.




                                        17
Raytrace
Cache and instructions
   High number of cache misses   Very low number of cache misses




                                                         There were no significative
                                                         diferences of IPC between
                                                         threads.




                                                                                18
Raytrace
Execution time (1/3)




                  These are average times from
                  multiple executions of the parallel
                  code only and without extrae
                  overhead.
                  There was a high average
                  deviation of 0.3 seconds in the
                  experiments.
                  Bigger inputs were more accurate.

                                               19
Raytrace
Execution time (2/3)




                  There was a smaller average
                  deviation of 0.03 seconds.

                  With 64 threads it runs almost
                  three times faster!




                                                20
Raytrace
Execution time (3/3)




                  There was a even smaller average
                  deviation of 0.02 seconds.

                  With 64 threads it runs almost
                  three times faster!




                                             21
Raytrace
Configuration comparison




                   In the case of the limited
                   configuration, although
                   perfomance doesn't seem
                   to degrade, the execution
                   time seems to stabilize for
                   more than 8 threads.



                                      22
Raytrace
Extrae overhead




                  23
Conclusions
Conclusions (1/3)
● The system seemed to perform worse for a
  number of threads multiple of the total
  number of physical cores.

● The program has a good load balancing.

● Fine-granular parallelism.


                                           24
Conclusions (2/3)
● Although it wasn't possible to verify,
  increasing the input should cause higher
  cache misses, because of the big working
  sets that won't fit on the memory.

● Memory bandwidth should be the main issue
  for good speedups.

● Boada killed almost all the native input
  executions.                                25
Conclusions (3/3)
● Paraver simplifies the process of analyzing
  an application performance.

● Better knowledge of the systems
  architecture would be needed in order
  further analyse the performance of the
  application.


                                            26
Questions

Contenu connexe

Tendances

Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
Hajime Tazaki
 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)
micchie
 

Tendances (20)

Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry codeKernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
 
protothread and its usage in contiki OS
protothread and its usage in contiki OSprotothread and its usage in contiki OS
protothread and its usage in contiki OS
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
 
Mmap failure analysis
Mmap failure analysisMmap failure analysis
Mmap failure analysis
 
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
 
676.v3
676.v3676.v3
676.v3
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
 
Using Flame Graphs
Using Flame GraphsUsing Flame Graphs
Using Flame Graphs
 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)
 
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
PASTE: Network Stacks Must Integrate with NVMM Abstractions
PASTE: Network Stacks Must Integrate with NVMM AbstractionsPASTE: Network Stacks Must Integrate with NVMM Abstractions
PASTE: Network Stacks Must Integrate with NVMM Abstractions
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux Kernel
 

En vedette

Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalability
Mário Almeida
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
Mário Almeida
 

En vedette (14)

Architecting a cloud scale identity fabric
Architecting a cloud scale identity fabricArchitecting a cloud scale identity fabric
Architecting a cloud scale identity fabric
 
Spark
SparkSpark
Spark
 
preserntasi skripsi BAB V
preserntasi skripsi BAB Vpreserntasi skripsi BAB V
preserntasi skripsi BAB V
 
Bab 4
Bab 4Bab 4
Bab 4
 
Bab 3
Bab 3Bab 3
Bab 3
 
Bronquiolitis
BronquiolitisBronquiolitis
Bronquiolitis
 
Bab 2
Bab 2Bab 2
Bab 2
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelization
 
Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalability
 
Self-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File SystemsSelf-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File Systems
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
 
Android reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeAndroid reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skype
 

Similaire à Instrumenting parsecs raytrace

Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
Ericsson
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUs
Vivek Venugopalan
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
oscon2007
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
guest40fc7cd
 

Similaire à Instrumenting parsecs raytrace (20)

cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Java under the hood
Java under the hoodJava under the hood
Java under the hood
 
Defense_Presentation
Defense_PresentationDefense_Presentation
Defense_Presentation
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
Intel’S Larrabee
Intel’S LarrabeeIntel’S Larrabee
Intel’S Larrabee
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUs
 
Concept of thread
Concept of threadConcept of thread
Concept of thread
 
Mesos Network Isolation at Criteo
Mesos Network Isolation at CriteoMesos Network Isolation at Criteo
Mesos Network Isolation at Criteo
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
ARM7TDM
ARM7TDMARM7TDM
ARM7TDM
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdf
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Instrumenting parsecs raytrace