CNIC Information System with Pakdata Cf In Pakistan
Use of a Levy Distribution for Modeling Best Case Execution Time Variation
1. Use of a Levy Distribution for Modeling
Best Case Execution Time Variation
Jonathan Beard, Roger Chamberlain
SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Work also supported by:
1
6. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Streaming Languages
StreamIt, Auto-Pipe, Brook, Cg, S-Net,
Scala-Pipe, Streams-C and
many others
5
7. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Optimization
Slow
Fast Kernel
Super Fast
Medium
6
8. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A
1 2
3 4
multi-core B
1 2
3 4
More allocation choices,
NUMA node A or B to
allocate stream.
7
9. 1 2
SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A
1 2
3 4
multi-core B
1 2
3 4
More allocation choices,
NUMA node A or B to
allocate stream.
7
10. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A
1 2
3 4
multi-core B
1 2
3 4
More allocation choices,
NUMA node A or B to
allocate stream.
1 2
7
11. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Optimization
A B C
“Stream” is modeled as a Queue
A Q1 B Q2 C
8
12. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Optimization
A B C
“Stream” is modeled as a Queue
A Q1 B Q2 C
8
13. We want good models for streaming systems
on shared multi-core systems (i.e., a cluster)
Problem: Accurate measurement is very difficult. Is there
a way to decide on a model without it.
• Commodity multi-core timer availability and latency
• Frequency scaling and core migration
• Measuring modifies the application behavior
SBS
Streaming on Multi-core Systems
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
9
14. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Derived Information
Expected Observed
10
15. SBS
Is there a pattern of minimal variation within the
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Derived Information
Expected Observed
systems we’re running on?
Avg. Service Time = E[ X ] + Error
10
16. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Goal
Find a distribution that characterizes
the minimum expected variation of a
hardware and software system
Use this characterization to
accept or reject models
11
17. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Process
12
• Measurement!
• Workload definition!
• Find a distribution!
• Utilize the distribution to aid model selection
18. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Timer Mechanism
Timer Thread Code
13
Ask for Time
Receive Time
19. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Timer Mechanism
Timer Thread
rdtsc clock_gettime
14
• x86 assembly
• varying methods
to serialize
• relatively fast
• multiple drift
issues
• POSIX standard
• relatively accurate
• portable
• slower than rdtsc
20. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Two Timing Choices
15
21. SBS
NUMA Node Variations
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
16
22. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Minimize Variation
• Restricting timer to single core
!
• Use the x86 rdtsc instruction with processor
recommended serializers for each processor
type
!
• Keeping processes under test on the same
NUMA node as timer
!
• Run timer thread with altered priority to
minimize core context swaps
17
23. SBS
Best Case Execution Time Variation
• no-op instruction implemented in most processors
!
• usually takes exactly 1 cycle
!
• no real functional units are involved, so least
taxing
!
• variation observed in execution time should be
external to process
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
18
24. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Data Collection
• no-op loops calibrated for various nominal
times, tied to a single core and run
thousands of times
!
• Execution time measured end to end for
each run, environment collected
!
• Parameters include:
Number of processes executing on core
Number of context swaps (voluntary,
involuntary)
Many others
19
25. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Levy Distribution
20
Execution Time Error
( obs - mean )
26. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Levy Distribution
21
Normal Distribution
27. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Levy Distribution
22
Gumbel Distribution
28. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Levy Distribution
23
Levy Distribution
29. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Levy Distribution
23
Levy Distribution
30. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Levy Distribution
• Truncation enables mean calculation, but
requires fitting to each dataset to find where
to truncate
!
• The truncation parameters are correlated to
both the number of processes per core and
the expected execution time
!
• Roughly linear relationship gives an
approximate solution to truncation
parameters without refitting
24
32. Hypothesis: Lower Kullback-Leibler (KL) divergence
SBS
Question: Can we use an M/M/1 queueing model to
estimate the mean queue occupancy of this system?
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Test Setup
A Q1 B
!
between expected and realized distribution is
associated with higher model accuracy.
26
33. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Test Setup
A Q1 B
1. Dedicated thread of execution monitors
27
queue occupancy
2. Calculate the estimated mean queue
occupancy using the M/M/1 model
3. Calculate KL Divergence for the arrival
process distribution using the truncated
Levy distribution noise model
34. SBS
Convolution with Exponential
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
28
35. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Conclusions
• The truncated Levy distribution can be used to
approximate BCETV
!
• The distribution of BCETV can be used as a tool
to accept or reject a stochastic queueing model
based on distributional assumptions
!
• KL divergence between the expected and
convolved distribution highly correlates with
queue model accuracy
29
36. SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Parting Notes
Slides available here:
sbs.wust.edu
!
Timer C++ template code:
http://goo.gl/ItJ3jP
!
Test harness used to collect data:
http://goo.gl/U1VG6N
30