Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal process- ing, dependence graphs can be generated from a program by static analysis. However, in emerging problem domains such as graph analytics, the set of tasks and dependences between tasks in a program are complex functions of runtime values and cannot be determined statically. In this paper, we introduce a novel approach for exploiting parallelism in such programs. This approach is based on a data structure called the kinetic dependence graph (KDG), which consists of a dependence graph together with update rules that incrementally update the graph to reflect changes in the dependence structure whenever a task is completed.
We have implemented a simple programming model that allows programmers to write these applications at a high level of abstraction, and a runtime within the Galois system that builds the KDG automatically and executes the program in parallel. On a suite of programs that are difficult to parallelize otherwise, we have obtained speedups of up to 33 on 40 cores, out-performing third-party implementations in many cases.
6. Dependence Graph Scheduling
6
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
7. Dependence Graph Scheduling
7
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
8. Dependence Graph Scheduling
8
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
9. Dependence Graph Scheduling
9
σ0
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
10. Dependence Graph Scheduling
10
G0
σ0
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
11. Dependence Graph Scheduling
11
G0
σ0
w0
G1
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
12. Dependence Graph Scheduling
12
G0
σ0
w0
G1
w1 w2
G2
Execute in Parallel
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
13. Dependence Graph Scheduling
13
G0
σ0
w0
G1
w1 w2 wn-1
G2 Gn…
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
14. Dependence Graph Scheduling
14
G0
σ0
w0
G1
w1 w2 wn-1
G2 Gn…
w0 σ1
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
15. Dependence Graph Scheduling
15
G0
σ0
w0
G1
w1 w2 wn-1
G2 Gn…
w0 σ1
w1 w2 wn-1σ2 … σn
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
16. Dependence Graph Scheduling
16
G0
σ0
w0
G1
w1 w2 wn-1
G2 Gn…
w0 σ1
w1 w2 wn-1σ2 … σn
Scheduler
World
Program
World
Application Programmer Compiler RuntimeScheduler
i
j
1
1
2
3
2 3
A[N,N] = ...
for i in 1:N
for j in 1:N
A[i,j] =
A[i-1,j] + A[i,j-1]
Parallel Task Library
Parallel Programming Model
…
17. Limitations of Dependence Graphs
• Not always sufficient
– Discrete event simulation
– Billiard ball simulation
– Kruskal’s algorithm for MSTs
– Asynchronous Variational
Integrators
– …
• Ordered algorithms
– Have tasks with algorithm-specific
order in which tasks must appear
to execute
• Can use speculation
– But overheads can be high for
ordered algorithms
17
32. Discrete Event Simulation
• Inadequacy of dependence graphs
– Tasks are created dynamically
– Not all sources are safe to execute
– Task executionmay require updating DAG
32
6A
C
D
B
2 3
4 7
6
round 0
round 1
round 2
8
8
8
8
33. • A dependence graph is:
– A DAG G for the program state σ
– An update rule U to produce the next G after
executing task w
• Remove source
33
σ0
w0 w1 w2 wn-1
w0 σ1
w1 w2 wn-1σ2 … σn
G0 G1 G2 Gn…
Execute sources
Dependence
Graph
Dependence Graph
34. Kinetic Dependence Graph
34
σ0
w0 w1 w2 wn-1
w0 σ1
w1 w2 wn-1σ2 … σn
G0 G1 G2 Gn…
Execute sources
• A kinetic dependence graph is:
– A DAG G for the program state σ
– A safe-source test P
– An update rule U to produce the next G after
executing task w
• Remove source, update other tasks’ dependencies, …
Dependence
Graph
35. Kinetic Dependence Graph
35
σ0
w0 σ1
w1 w2 wn-1σ2 … σn
• A kinetic dependence graph is:
– A DAG G for the program state σ
– A safe-source test P
– An update rule U to produce the next G after
executing task w
• Remove source, update other tasks’ dependencies, …
<σ0, G0> <σ1, G1> <σ2, G2> <σn, Gn>
w0 w1 w2 wn-1
…
Dependence
Graph
Kinetic
36. Kinetic Dependence Graph
36
σ0
w0 σ1
w1 w2 wn-1σ2 … σn
• A kinetic dependence graph is:
– A DAG G for the program state σ
– A safe-source test P
– An update rule U to produce the next G after
executing task w
• Remove source, update other tasks’ dependencies, …
<σ0, G0> <σ1, G1> <σ2, G2> <σn, Gn>
w0 w1 w2 wn-1
…
Dependence
Graph
Kinetic
Execute safe sources
57. KDG Specializations
• Each step needs a barrier
• Task’s rw-set potentially updated
several times prior to execution
57
Opportunities
KDG Runtime
58. KDG Specializations
• Each step needs a barrier
• Task’s rw-set potentially updated
several times prior to execution
58
Opportunities Program Properties
Application ProgrammerKDG Runtime
59. KDG Specializations
• Each step needs a barrier
• Task’s rw-set potentially updated
several times prior to execution
59
• Sources
– In general: safe source test could
inspect entire state
– Local safe source test
– Stable source (sources are safe)
• RW-sets
– In general: task could change
other tasks’ RW-sets
– Non-increasing RW-set
– Structure-based RW-set
• New tasks
– In general: task could create new
tasks at any priority
– Monotonic
– No new tasks
Opportunities Program Properties
Application ProgrammerKDG Runtime
60. KDG Specializations
• Each step needs a barrier
• Task’s rw-set potentially updated
several times prior to execution
60
• Sources
– In general: safe source test could
inspect entire state
– Local safe source test
– Stable source (sources are safe)
• RW-sets
– In general: task could change
other tasks’ RW-sets
– Non-increasing RW-set
– Structure-based RW-set
• New tasks
– In general: task could create new
tasks at any priority
– Monotonic
– No new tasks
Opportunities Program Properties
• Remove unnecessary barriers
given program properties
• Construct KDG for prefix of tasks
• Use different DAG representations
Specializations
Application ProgrammerKDG Runtime
61. KDG Specializations
• Each step needs a barrier
• Task’s rw-set potentially updated
several times prior to execution
61
• Sources
– In general: safe source test could
inspect entire state
– Local safe source test
– Stable source (sources are safe)
• RW-sets
– In general: task could change
other tasks’ RW-sets
– Non-increasing RW-set
– Structure-based RW-set
• New tasks
– In general: task could create new
tasks at any priority
– Monotonic
– No new tasks
Opportunities Program Properties
• Remove unnecessary barriers
given program properties
• Construct KDG for prefix of tasks
• Use different DAG representations
Specializations
Application ProgrammerKDG Runtime
62. KDGs in Action
• Application programmer
– Writes programs with ordered
loops
• Loop body must use provided
data structure library
• Loop properties provided via
annotations
• Loop body must read all
elements before modifying any
(cautious)
• KDG runtime (in Galois
system)
– Uses library API to track rw-
sets at runtime
– Selects executor based on
annotations
62
Graph g = ...
Set<Event> E = ...
@hasStructuredRWSets
@monotonic
foreach Event e in E orderedby e.t
process(e, g)
if *
E.push(newE)
63. Evaluation
• 7 applications
– From billiards simulation (hard-to-parallelize) to
tree traversal (well supported by prior work)
• 2 types of programs
– Other
• 3rd-party handwritten, application-specific, OpenMP
tasks, Cilk
– KDG-Auto
• Ordered loops + program property annotations
• 40-core, shared-memory machine
63
67. Summary
• Problem
– Dependence graph scheduling is insufficient for
many ordered programs
• Solution
– Develop general KDG executor
– Specialize executor according to small number of
programproperties
67
“It is a mistaketo try to look too far ahead. The chain of
destiny can only be grasped one link at a time.”
-Winston Churchill