Instrumenting parsecs raytrace

Instrumenting a
benchmark
application
Tools and Measurements Techniques
Project by Mário Almeida (EMDC)

Barcelona, 25 April 2012

Index (1/2)
Tools and configuration
● Parsec
○ Overview
○ Benchmark programs
● Extrae
● Paraver
● Configuration

1

Index (2/2)
Measurements
● Raytrace
○ Overview
○ Code
○ Inputs
○ Traces
○ Load Balancing
○ Cache misses and instructions
○ Execution time
○ Configuration comparisons
○ Extrae overhead
Conclusions 2

Parsec
Overview
● Benchmark with the following characteristics:
○ Multithreaded
○ Emerging workloads
○ Diverse
○ Not HPC-focused
○ Research

3

Parsec
Benchmark programs
● blackscholes
● bodytrack
● canneal
● dedup
● facesim
● ferret
● fluidanimate
● freqmine
● raytrace
● ... 4

Extrae
● Instrumentation package to trace programs
and run with shared memory model and
message passing programming.

5

Paraver
● Detailed quantitative analysis of a program
performance.
● Concurrent comparative analysis of several
traces.
● Support for mixed message passing and
shared memory.
● Building of derived metrics.

6

Configuration (1/4)
Boada server:
● Dual CPU Six Core with Hyperthreading.
● Kills applications after a few minutes.
● 24 GB of RAM.

Boada server:
● Used cpulimit to limit the cpu usage up to four cores.

7

Configuration (2/4)
Installed and/or configured:
● Parsec 2.1 with raytrace package only.
● Extrae 2.2.1.
● Paraver 4.3.0 (in my laptop).
● CpuLimit
● Minor configurations on .bashrc.
● Multiple scripts to clean, build and run.

8

Configuration (3/4)

9

Configuration (4/4)

10

Raytrace
Overview
● Physical simulation for visualization
● Computer animation
● Input is a complex object of many triangles.

11

Raytrace
Code
For every pixel in the image
calculate trajectory of ray striking pixel
find closest intersection point of ray with scene
geometry
calculate contribution of all lights at intersection point
recursively trace specularly reflected ray
end for

12

Raytrace
Inputs
● simsmall - 1 million polygons (480x270)
● simmedium - 1 million poly (960x540)
● simlarge - 1 million poly (1920x1080)
● native - 10 million poly (1920x1080)

13

Raytrace
Trace (1/2)
Only 10% of the execution time is parallel!

Not created Running

14

Raytrace
Trace (2/2)
Render time is proportional to the # of frames!
Init and adding object Build Context Render

15

Raytrace
Load balancing (1/2)

Not created Create Threads Task

Barrier Wait for all threads 16

Raytrace
Load balancing (2/2)
Good load balancing between the slave
threads.

17

Raytrace
Cache and instructions
High number of cache misses Very low number of cache misses

There were no significative
diferences of IPC between
threads.

18

Raytrace
Execution time (1/3)

These are average times from
multiple executions of the parallel
code only and without extrae
overhead.
There was a high average
deviation of 0.3 seconds in the
experiments.
Bigger inputs were more accurate.

19

Raytrace

There was a smaller average
deviation of 0.03 seconds.

With 64 threads it runs almost
three times faster!

20

Raytrace

There was a even smaller average
deviation of 0.02 seconds.

With 64 threads it runs almost
three times faster!

21

Raytrace
Configuration comparison

In the case of the limited
configuration, although
perfomance doesn't seem
to degrade, the execution
time seems to stabilize for
more than 8 threads.

22

Raytrace
Extrae overhead

23

Conclusions (1/3)
● The system seemed to perform worse for a
number of threads multiple of the total
number of physical cores.

● The program has a good load balancing.

● Fine-granular parallelism.

24

Conclusions (2/3)
● Although it wasn't possible to verify,
increasing the input should cause higher
cache misses, because of the big working
sets that won't fit on the memory.

● Memory bandwidth should be the main issue
for good speedups.

● Boada killed almost all the native input
executions. 25

Conclusions (3/3)
● Paraver simplifies the process of analyzing
an application performance.

● Better knowledge of the systems
architecture would be needed in order
further analyse the performance of the
application.

26

Instrumenting parsecs raytrace

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Instrumenting parsecs raytrace

Similaire à Instrumenting parsecs raytrace (20)

Dernier

Dernier (20)

Instrumenting parsecs raytrace