An introduction to cloud programming models and the Skywriting project. Talk originally given at the University of Cambridge, on 11th June 2010.
More information about the Skywriting project can be found here: http://www.cl.cam.ac.uk/netos/skywriting/
3. SMP programming"
• Symmetric multiprocessing"
– All cores share the same address space"
– Usually assumes cache-coherency"
• Multi-threaded programs"
– Threads created dynamically"
– Shared, writable data structures"
– Synchronisation"
• Atomic compare-and-swap"
• Mutexes, Semaphores"
• {Software, Hardware} Transactional Memory"
4. Distributed programming"
• Shared nothing*"
– Processors communicate by message passing"
– Standard assumption for large supercomputers"
– ...and data centres"
– ...and recent manycore machines (e.g. Intel SCC)"
• Explicit messages"
– MPI, Pregel, Actor-based"
• Implicit messages: task parallelism"
– MapReduce, Dryad"
5. Task parallelism"
• Master-worker architecture"
– Master maintains task queue"
– Workers execute independent tasks in parallel"
• Fault tolerance"
– Re-execute tasks on failed workers"
– Speculatively re-execute “slow” tasks"
• Load balancing"
– Workers consume tasks at their own rate"
– Task granularity must be optimised"
6. Task graph"
A runs before B"
A" B"
B depends on A"
7. MapReduce"
• Two kinds of task: map and reduce"
• User-specified map and reduce functions"
– map() a record to a list of key-value pairs"
– reduce() a key and the list of related values"
• Tasks apply functions to data partitions"
M" R"
M" R"
M" R"
8. Dryad"
• Task graph is first class"
– Vertices run arbitrary code"
– Multiple inputs and inputs"
– Channels specify data transport"
• Graph must be acyclic and finite"
– Permits topological sorting"
– Prevents unbounded iteration"
13. Skywriting"
while
(…)
doStuff();
Code"
Results"
• Turing-complete language
for job specification"
• Whole job executed on the
cluster"
14. Spawning a Skywriting task"
function
f(arg1,
arg2)
{
…
}
result
=
spawn(f,
[arg1,
arg2]);
//
result
is
a
“future”
value_of_result
=
*result;
15. Building a task graph"
function
f(x,
y)
{
…
}
function
g(x,
y)
{
…
}
f
function
h(x,
y)
{
…
}
a
a
a
=
spawn(f,
[7,
8]);
g
g
b
=
spawn(g,
[a,
0]);
c
=
spawn(g,
[a,
1]);
d
=
spawn(h,
[b,
c]);
b
c
return
d;
h
d
16. Iterative algorithm"
current
=
…;
do
{
prev
=
current;
a
=
spawn(f,
[prev,
0]);
b
=
spawn(f,
[prev,
1]);
c
=
spawn(f,
[prev,
2]);
current
=
spawn(g,
[a,
b,
c]);
done
=
spawn(h,
[current]);
while
(!*done);
18. Aside: recursive algorithm"
function
f(x)
{
if
(/*
x
is
small
enough
*/)
{
return
/*
do
something
with
x
*/;
}
else
{
x_lo
=
/*
bottom
half
of
x
*/;
x_hi
=
/*
top
half
of
x
*/;
return
[spawn(f,
[x_lo]),
spawn(f,
[x_hi])];
}
}
19. Performance case studies"
• All experiments used Amazon EC2"
– m1.small instances, running Ubuntu 8.10"
• Microbenchmark"
• Smith-Waterman"
23. Parallel Smith-Waterman"
350"
300"
Time (seconds)"
250"
200"
150"
100"
50"
0"
1" 10" 100" 1000" 10000"
Number of tasks"
24. Future work: manycore"
• Trade-offs are different"
– Centralised master may become a bottleneck"
– Switch to local work-queues and work-stealing"
– Distributed scoreboards for futures"
– Optimised interpreter/compilation?"
• Multi-scale hybrid approaches"
– Multiple cores"
– Multiple machines"
– Multiple clouds..."
25. Future work: message-passing"
• Language designed for batch processing"
– However, batches may be infinitely long!"
• Can we apply it to streaming data?"
– Log files"
– Financial reports"
– Sensor data"
• Can we include data dependencies?"
– High- and low-frequency streams"