SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Paraforming:	
  

Forming	
  Parallel	
  (Func2onal)	
  Programs	
  from	
  
High-­‐Level	
  Pa:erns	
  	
  using	
  Advanced	
  
Refactoring	
  
Kevin	
  Hammond,	
  Chris	
  Brown,	
  Vladimir	
  Janjic	
  
University	
  of	
  St	
  Andrews,	
  Scotland	
  
Build	
  Stuff,	
  Vilnius,	
  Lithuania,	
  December	
  10	
  2013	
  
T:	
  	
  @paraphrase_fp7,	
  @khstandrews	
  
E:	
  	
  kh@cs.st-­‐andrews.ac.uk	
  
W: http://www.paraphrase-ict.eu!
The	
  Present	
  

Pound	
  versus	
  Dollar	
  
3	
  
The	
  Future:	
  “megacore”	
  computers?	
  
§  Hundreds	
  of	
  thousands,	
  or	
  millions,	
  of	
  (small)	
  cores	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
Core	
  
5	
  
What	
  will	
  “megacore”	
  computers	
  look	
  
like?	
  
§  Probably	
  not	
  just	
  scaled	
  versions	
  of	
  today’s	
  mul2core	
  
§ 
§ 
§ 
§ 
§ 
§ 

Perhaps	
  hundreds	
  of	
  dedicated	
  lightweight	
  integer	
  units	
  
Hundreds	
  of	
  floa9ng	
  point	
  units	
  (enhanced	
  GPU	
  designs)	
  
A	
  few	
  heavyweight	
  general-­‐purpose	
  cores	
  
Some	
  specialised	
  units	
  for	
  graphics,	
  authen9ca9on,	
  network	
  etc	
  
possibly	
  so*	
  cores	
  (FPGAs	
  etc)	
  
Highly	
  heterogeneous	
  

6	
  
What	
  will	
  “megacore”	
  computers	
  look	
  
like?	
  
§  Probably	
  not	
  uniform	
  shared	
  memory	
  
§  NUMA	
  is	
  likely,	
  even	
  hardware	
  distributed	
  shared	
  memory	
  
§  or	
  even	
  message-­‐passing	
  systems	
  on	
  a	
  chip	
  
§  shared-­‐memory	
  will	
  not	
  be	
  a	
  good	
  abstrac:on	
  

int	
  arr	
  	
  	
  [x]	
  [y];	
  

7	
  
Laki (NEC Nehalem Cluster) and hermit (XE6)
Laki

hermit (phase 1 step 1)
700 dual socket Xeon 5560 2,8GHz
(“Gainestown”)

4/6

Nodes with 32GB and 64GB memory
reflecting different user needs
2.7PB storage capacity @ 150GB/s IO
bandwidth

I

Scientific Linux 6.0

Each compute node will have 2 sockets
AMD Interlagos @ 2.3GHz 16 Cores
each leading to 113.664 cores

External Access Nodes, Pre- 
Postprocessing Nodes, Remote
Visualization Nodes

32 nodes with additional Nvidia Tesla
S1070

I

96 service nodes and 3552 compute
nodes

I

Infiniband (QDR)

I

I

12 GB DDR3 RAM / node

I

38 racks with 96 nodes each

I

I

I

I

I

::

HLRS in ParaPhrase

::

Turin, 4th/5th October 2011

::

8	
  
The	
  Biggest	
  Computer	
  in	
  the	
  World	
  

Tianhe-­‐2,	
  Chinese	
  Na2onal	
  University	
  of	
  Defence	
  Technology	
  
	
  
33.86	
  petaflops/s	
  (June	
  17,	
  2013)	
  
16,000	
  Nodes;	
  each	
  with	
  2	
  Ivy	
  Bridge	
  mul9cores	
  and	
  3	
  Xeon	
  Phis	
  
3,120,000	
  x86	
  cores	
  in	
  total!!!	
  

9	
  
It’s	
  not	
  just	
  about	
  large	
  systems	
  
•  Even	
  mobile	
  phones	
  are	
  
mul9core	
  
§  Samsung	
  Exynos	
  5	
  Octa	
  has	
  8	
  cores,	
  4	
  of	
  
which	
  are	
  “dark”	
  

•  Performance/energy	
  tradeoffs	
  
mean	
  systems	
  will	
  be	
  
increasingly	
  parallel	
  
•  If	
  we	
  don’t	
  solve	
  the	
  mul9core	
  
challenge,	
  then	
  no	
  other	
  
advances	
  will	
  maber!	
  

ALL	
  Future	
  	
  
Programming	
  will	
  be	
  
Parallel!	
  

10	
  
The	
  Manycore	
  Challenge	
  
“Ul9mately,	
  developers	
  should	
  start	
  thinking	
  about	
  tens,	
  hundreds,	
  and	
  
thousands	
  of	
  cores	
  now	
  in	
  their	
  algorithmic	
  development	
  and	
  deployment	
  
pipeline.”	
  	
  
	
  
	
  
Anwar	
  Ghuloum,	
  Principal	
  Engineer,	
  Intel	
  Microprocessor	
  Technology	
  Lab	
  

The	
  ONLY	
  important	
  challenge	
  in	
  Computer	
  Science	
  
(Intel)	
  

“The	
  dilemma	
  is	
  that	
  a	
  large	
  percentage	
  of	
  mission-­‐cri9cal	
  enterprise	
  applica9ons	
  
will	
  not	
  ``automagically''	
  run	
  faster	
  on	
  mul9-­‐core	
  servers.	
  In	
  fact,	
  many	
  will	
  
actually	
  run	
  slower.	
  We	
  must	
  make	
  it	
  as	
  easy	
  as	
  possible	
  for	
  applica9ons	
  
	
  
programmers	
  to	
  exploit	
  the	
  latest	
  developments	
  in	
  mul9-­‐core/many-­‐core	
  
Also	
  recognised	
  as	
  thema9c	
  priori9es	
  by	
  EU	
  and	
  
architectures,	
  while	
  s9ll	
  making	
  it	
  easy	
  to	
  target	
  future	
  (and	
  perhaps	
  
na9onal	
  funding	
  bodies	
  
unan9cipated)	
  hardware	
  developments.”	
  
	
  
Patrick	
  Leonard,	
  Vice	
  President	
  for	
  Product	
  Development	
  
Rogue	
  Wave	
  Sobware	
  
But	
  Doesn’t	
  that	
  mean	
  millions	
  of	
  
threads	
  on	
  a	
  megacore	
  machine??	
  

13	
  
How	
  to	
  build	
  a	
  wall	
  

(with	
  apologies	
  to	
  Ian	
  Watson,	
  Univ.	
  Manchester)	
  
How	
  to	
  build	
  a	
  wall	
  faster	
  
How	
  NOT	
  to	
  build	
  a	
  wall	
  

Typical	
  CONCURRENCY	
  
Approaches	
  require	
  the	
  
Programmer	
  to	
  solve	
  these	
  

Task	
  iden2fica2on	
  is	
  not	
  the	
  only	
  problem…	
  
Must	
  also	
  consider	
  Coordina9on,	
  communica9on,	
  placement,	
  
scheduling,	
  …	
  
We	
  need	
  structure	
  
We	
  need	
  abstrac2on	
  
	
  
We	
  don’t	
  need	
  another	
  brick	
  in	
  the	
  wall	
  

17	
  
Thinking	
  Parallel	
  
§  Fundamentally,	
  programmers	
  must	
  learn	
  to	
  “think	
  parallel”	
  
§  this	
  requires	
  new	
  high-­‐level	
  programming	
  constructs	
  
§  perhaps	
  dealing	
  with	
  hundreds	
  of	
  millions	
  of	
  threads	
  

§  You	
  cannot	
  program	
  effec2vely	
  while	
  worrying	
  about	
  deadlocks	
  etc.	
  
§  they	
  must	
  be	
  eliminated	
  from	
  the	
  design!	
  

§  You	
  cannot	
  program	
  effec2vely	
  while	
  fiddling	
  with	
  communica2on	
  etc.	
  
§  this	
  needs	
  to	
  be	
  packaged/abstracted!	
  

§  You	
  cannot	
  program	
  effec2vely	
  without	
  performance	
  informa2on	
  
§  this	
  needs	
  to	
  be	
  included	
  as	
  part	
  of	
  the	
  design!	
  

18	
  
A	
  Solu2on?	
  

“The	
  only	
  thing	
  that	
  works	
  for	
  
parallelism	
  is	
  func2onal	
  
programming”	
  
Bob	
  Harper,	
  Carnegie	
  Mellon	
  University	
  
Parallel	
  Func2onal	
  Programming	
  
§  No	
  explicit	
  ordering	
  of	
  expressions	
  
§  Purity	
  means	
  no	
  side-­‐effects	
  
§  Impossible	
  for	
  parallel	
  processes	
  to	
  interfere	
  with	
  each	
  other	
  
§  Can	
  debug	
  sequen2ally	
  but	
  run	
  in	
  parallel	
  
§  Enormous	
  saving	
  in	
  effort	
  

§  Programmer	
  concentrate	
  on	
  solving	
  the	
  problem	
  
§  Not	
  por9ng	
  a	
  sequen9al	
  algorithm	
  into	
  a	
  (ill-­‐defined)	
  parallel	
  domain	
  

§  No	
  locks,	
  deadlocks	
  or	
  race	
  condi2ons!!	
  
§  Huge	
  produc2vity	
  gains!	
  

λ	
  
λ	
  
λ	
  
ParaPhrase	
  Project:	
  Parallel	
  Pa:erns	
  for	
  Heterogeneous	
  Mul2core	
  Systems	
  
(ICT-­‐288570),	
  	
  2011-­‐2014,	
  €4.2M	
  budget	
  
	
  
13	
  Partners,	
  8	
  European	
  countries	
  
	
  UK,	
  Italy,	
  Germany,	
  Austria,	
  Ireland,	
  Hungary,	
  Poland,	
  Israel	
  
	
  
Coordinated	
  by	
  Kevin	
  Hammond	
  St	
  Andrews	
  

0	
  
The	
  ParaPhrase	
  Approach	
  
§  Start	
  bobom-­‐up	
  
§  iden9fy	
  	
  (strongly	
  hygienic)	
  COMPONENTS	
  
§  using	
  semi-­‐automated	
  refactoring	
  

both	
  legacy	
  and	
  
new	
  programs	
  

§  Think	
  about	
  the	
  PATTERN	
  of	
  parallelism	
  
§  e.g.	
  map(reduce),	
  task	
  farm,	
  parallel	
  search,	
  parallel	
  comple9on,	
  ...	
  

§  STRUCTURE	
  the	
  components	
  into	
  a	
  parallel	
  program	
  
§  turn	
  the	
  pa?erns	
  into	
  concrete	
  (skeleton)	
  code	
  
§  Take	
  performance,	
  energy	
  etc.	
  into	
  account	
  (mul9-­‐objec9ve	
  op9misa9on)	
  
§  also	
  using	
  refactoring	
  

§  RESTRUCTURE	
  if	
  necessary!	
  (also	
  using	
  refactoring)	
  
25	
  
Some	
  Common	
  Pa:erns	
  
§  High-­‐level	
  abstract	
  paberns	
  of	
  common	
  parallel	
  algorithms	
  

Google	
  map-­‐
reduce	
  combines	
  
two	
  of	
  these!	
  

Generally,	
  we	
  need	
  
to	
  nest/combine	
  
paberns	
  in	
  arbitray	
  
ways	
  

35	
  
The	
  Skel	
  Library	
  for	
  Erlang	
  
§  Skeletons	
  implement	
  specific	
  parallel	
  paberns	
  
§  Pluggable	
  templates	
  

§  Skel	
  is	
  a	
  new	
  (AND	
  ONLY!)	
  Skeleton	
  library	
  in	
  Erlang	
  
§  map,	
  farm,	
  reduce,	
  pipeline,	
  feedback	
  
§  instan9ated	
  using	
  skel:run	
  

§  Fully	
  Nestable	
  

chrisb.host.cs.st-­‐andrews.ac.uk/skel.html	
  

hbps://github.com/ParaPhrase/skel	
  
§  A	
  DSL	
  for	
  parallelism	
  
!
OutputItems = skel:run(Skeleton, InputItems).!
!
36	
  
e

Parallel	
  Pipeline	
  Skeleton	
  
§  Each	
  stage	
  of	
  the	
  pipeline	
  can	
  be	
  executed	
  in	
  parallel	
  
§  The	
  input	
  and	
  output	
  are	
  streams	
  

{pipe, [Skel1 , Skel2 , · · · , Skeln ]}
Tn · · · T1

Skel1

Skel2

···

Skeln

Tn · · · T1

skel:run([{pipe,[Skel1, Skel2,..,SkelN]}], Inputs).!

Inc
= { seq , fun ( X ) - X +1 end } ,
!
Double = { seq , fun ( X ) - X *2 end } ,
skel : run ( { pipe , [ Inc , Double ] } ,
	
  
[ 1 ,2 ,3 ,4 ,5 ,6 ] ).

37	
  
m

Farm	
  Skeleton	
  
	
  
§  Each	
  worker	
  is	
  executed	
  in	
  parallel	
  
§  A	
  bit	
  like	
  a	
  1-­‐stage	
  pipeline	
  
{farm, Skel, M}
Skel1
Tn · · · T1

	
  
!

Skel2

.
.
.

Tn · · · T1

SkelM

skel:do([{farm, Skel, M}], Inputs).!

nc = { seq , fun ( X ) - X +1 end } ,

38	
  
Using	
  The	
  Right	
  Pa:ern	
  Ma:ers	
  

Speedup

Speedups for Matrix Multiplication
24
22
20
18
16
14
12
10
8
6
4
2

Naive Parallel
Farm
Farm with Chunk 16

12 4

8

12
16
No. cores

20

24

39	
  
The	
  ParaPhrase	
  Approach	
  
Erlang	
  

SequenGal	
  
Code	
  
Generic	
  
Pa:ern	
  Library	
  
Parallel	
  
Code	
  

Erlang	
  

C/C++	
   Java	
  

Haskell	
  

Cos9ng/
Profiling	
  

Refactoring	
  
C/C++	
   Java	
  

...	
  

...	
  

Haskell	
  

Mellanox	
  Infiniband	
  
Nvidia	
  
Tesla	
  

AMD	
  
Opteron	
  

AMD	
  
Opteron	
  

Intel	
  
Core	
  

Intel	
  
Core	
  

Nvidia	
  
GPU	
  

Nvidia	
  
GPU	
  

Intel	
  
GPU	
  

Intel	
  
GPU	
  

Intel	
  
Xeon	
  Phi	
  
Refactoring	
  
§  Refactoring	
  changes	
  the	
  
structure	
  of	
  the	
  source	
  code	
  
§  using	
  well-­‐defined	
  rules	
  
§  semi-­‐automa:cally	
  under	
  
programmer	
  guidance	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  

Review
S1Refactoring:	
  Farm	
  Introduc2on	
  
S2
⌘
P ipe(S1 , S2 )
pipe seq
Map(S1 S2 , d, r)
⌘
Map(S1 , d, r) Map(S2 , d, r)
map fission/fusion
S
⌘
F arm(S)
farm intro/elim
Map(F, d, r)
⌘
P ipe(Decomp(d)), F arm(F ), Recomp(r)) data2stream
0
S1
⌘
Map(S1 , d, r)
map intro/elim
Figure 3.3: Some Standard Skeleton Equivalences

Farm	
  

The following describes each of the patterns in turn:
• a MAP is made up of three OPERATIONs: a worker, a partitioner, and a
combiner, followed by an INPUT;
• a SEQ is made up of a single OPERATION denoting the sequential computation to be performed, followed by an INPUT;
• a FARM is made up of a single OPERATON denoting the working, an INPUT

44	
  
Image	
  Processing	
  Example	
  
Read	
  Image	
  1	
  

Read	
  Image	
  2	
  

{	
  Build	
  

	
  
	
  	
  	
  	
  Stuff	
  }	
  
	
  

“White	
  
screening”	
   {	
  Build	
  

Merge	
  
Images	
  

{	
  Build	
  
	
  	
  	
  	
  Stuff	
  }	
  

Write	
  Image	
  
45	
  
Basic	
  Erlang	
  Structure	
  
[ writeImage(convertMerge(readImage(X))) !
!
!
!
!
!|| X - Images() ]!
!
readImage({In1, in2, out) -!
!…!
!{ Image1, Image2, out}.!
!
convertImage({Image1, Image2, out}) -!
!Image1P = whiteScreen(Image1),!
!Image2P = mergeImages(Image1, Image2),!
!{Image2P, out}.!
!
writeImage({Image, Out}) - …!

	
  
	
  

46	
  
Refactoring	
  Demo	
  

47	
  
Refactoring	
  Demo	
  

48	
  
Speedup	
  Results	
  (Image	
  Processing)	
  

Speedup

Speedups for Haar Transform (Skel Task Farm)
24
22
20
18
16
14
12
10
8
6
4
2
1

1D Skel Task Farm
1D Skel Task Farm with Chunk Size = 4
2D Skel Task Farm

12 4

8
12
16
20
No. Farm Workers

24

50	
  
Large-­‐Scale	
  Demonstrator	
  Applica2ons	
  
§  ParaPhrase	
  tools	
  are	
  being	
  used	
  by	
  commercial/end-­‐user	
  partners	
  
§  SCCH	
  (SME,	
  Austria)	
  
§  Erlang	
  Solu9ons	
  Ltd	
  (SME,	
  UK)	
  
§  Mellanox	
  (Israel)	
  
§  ELTESos,	
  Hungary	
  (SME)	
  
§  AGH	
  (University,	
  Poland)	
  
§  HLRS	
  (High	
  Performance	
  Compu9ng	
  Centre,	
  Germany)	
  
Speedup	
  Results	
  (demonstrators)	
  

Speedup

Speedups for Ant Colony, BasicN2 and Graphical Lasso
24
22
20
18
16
14
12
10
8
6
4
2
1

BasicN2
BasicN2 Manual
Graphical Lasso
Graphical Lasso Manual
Ant Colony Optimisation Manual
Ant Colony Optimisation

Speedup	
  close	
  to	
  
or	
  beHer	
  than	
  
manual	
  
op9misa9on	
  

1 2 4 6 8 10 12 14 16 18 20 22 24
No of Workers

55	
  
Bow2e2:	
  most	
  widely	
  used	
  DNA	
  
alignment	
  tool	
  
28
30

26

Speedup

Speedup

24
22
20

25

20

18
16

15

Bt2FF-pin+int
Bt2

14
20

30

40

50

60
70
80
Read Length

90

100 110

Bt2FF-pin+int
Bt2
28

30

32

34
Quality

36

38

40

Original	
  
Paraphrase	
  

C.	
  Misale.	
  Accelera9ng	
  Bow9e2	
  
with	
  a	
  lock-­‐less	
  concurrency	
  
approach	
  and	
  memory	
  affinity.	
  
IEEE	
  PDP	
  2014.	
  To	
  appear.	
  

56	
  
Comparison	
  of	
  Development	
  Times	
  

ge pipeline (k),
ates the images
the images (F ).
tained from the
e first farm and
o three workers
es, and one for
e load balancers
e, the nature of
second stage of
first stage takes
e takes around
n a substantial

Convolution
Ant Colony
BasicN2
Graphical Lasso

Man.Time
3 days
1 day
5 days
15 hours

Refac. Time
3 hours
1 hour
5 hours
2 hours

LOC Intro.
58
32
40
53

Figure 3.
Approximate manual implementation time of use-cases vs.
refactoring time with lines of code introduced by refactoring tool

linear scaling for higher numbers of cores, because of cache
synchronisation (disjunct but interleaving memory regions are
updated in the tasks), and an uneven size combined with a
limited number of tasks (48). At the end of the computation,
58	
  
some cores will wait idly for the completion of remaining
Heterogeneous	
  Parallel	
  Programming	
  
Profile#
Informa'on*
1.#Iden(fy*

Applica'on*

Structured*Code*

Ini'al*Structure*
…*
Int*main*()*…*
For*(int*I*=0;*I**N;*i++)**
**f*(*g*(x));*
…*

Config.*1*

2.#Enumerate##
Skeleton*
Configura'ons*

Config.*2*

3.#Filter#
Using*Cost*
Model*

Pipeline*

4.*Apply*MCTS*
…*

…*

Op'mal*Parallel**Configura'on*
With*Mappings#

Refactorer*
with*Mappings#

CPU*
7.#Execute#

…*
Int*main*()*…*
Farm1*=*Farm(f,*8,*2);*
Pipe(farm1,*GPU(g));*
…*
*

GPU*

Config.*2(a)*

5.#Choose#Op'mal*
Mapping/Configura'on*

Heterogeneous*Machine#

CPU*

Config.*1(b)*

Profile#
Informa'on*

GPU*
CPU*
Component* Component*

CPU*

Config.*2*

Config.*3*

Config.*1(a)*

Farm*

Config.*1*

GPU*

Refactorer*
6.#Refactor#
Applica'on*

CPU*

CPU*

CPU*

[RGU	
  /	
  
USTAN]	
  
Example:	
  Enumerate	
  Skeleton	
  Configura2ons	
  
for	
  Image	
  Convolu2on	
  

Δ(r  p)

r || Δ(p)

Δ(r) p

r p
r || p

Δ(r) Δ(p)
r 	
  :	
  read	
  image	
  file	
  

p 	
  :	
  process	
  image	
  file	
  

r  Δ(p)
Results	
  on	
  Benchmark:	
  Image	
  Convolu2on	
  

MCTS	
  Mapping	
  (C,	
  G):	
  
	
  
	
  	
  (6,	
  0)	
  ||	
  (0,	
  3)	
  
Speedup	
  39.12	
  
	
  
Best	
  Speedup:	
  40.91	
  
Conclusions	
  
§  The	
  manycore	
  revolu9on	
  is	
  upon	
  us	
  
§  Computer	
  hardware	
  is	
  changing	
  very	
  rapidly	
  
(more	
  than	
  in	
  the	
  last	
  50	
  years)	
  
§  The	
  megacore	
  era	
  is	
  here	
  (aka	
  exascale,	
  BIG	
  data)	
  

§  Heterogeneity	
  and	
  energy	
  are	
  both	
  important	
  
§  Most	
  programming	
  models	
  are	
  too	
  low-­‐level	
  
§  concurrency	
  based	
  
§  need	
  to	
  expose	
  mass	
  parallelism	
  

§  Paberns	
  and	
  func:onal	
  programming	
  help	
  with	
  abstrac9on	
  
§  millions	
  of	
  threads,	
  easily	
  controlled	
  
Conclusions	
  (2)	
  
§  Func9onal	
  programming	
  makes	
  it	
  easy	
  to	
  introduce	
  parallelism	
  
§  No	
  side	
  effects	
  means	
  any	
  computa9on	
  could	
  be	
  parallel	
  
§  Matches	
  pabern-­‐based	
  parallelism	
  
§  Much	
  detail	
  can	
  be	
  abstracted	
  

§  Lots	
  of	
  problems	
  can	
  be	
  avoided	
  
§  e.g.	
  Freedom	
  from	
  Deadlock	
  
§  Parallel	
  programs	
  give	
  the	
  same	
  results	
  as	
  sequen9al	
  ones!	
  

§  Automa9on	
  is	
  very	
  important	
  
§  Refactoring	
  drama9cally	
  reduces	
  development	
  9me	
  
(while	
  keeping	
  the	
  programmer	
  in	
  the	
  loop)	
  
§  Machine	
  learning	
  is	
  very	
  promising	
  for	
  determining	
  complex	
  performance	
  sewngs	
  

	
  
	
  
But	
  isn’t	
  this	
  all	
  just	
  wishful	
  thinking?	
  

Rampant-­‐Lambda-­‐Men	
  in	
  St	
  Andrews	
  
66	
  
NO!	
  
§  C++11	
  has	
  lambda	
  func9ons	
  (and	
  some	
  other	
  nice	
  func9onal-­‐
inspired	
  features)	
  
§  Java	
  8	
  will	
  have	
  lambda	
  (closures)	
  
§  Apple	
  uses	
  closures	
  in	
  Grand	
  Central	
  Dispatch	
  

67	
  
ParaPhrase	
  Parallel	
  C++	
  Refactoring	
  
§  Integrated	
  into	
  Eclipse	
  
§  Supports	
  full	
  C++(11)	
  standard	
  
§  Uses	
  strongly	
  hygienic	
  components	
  
§  func9onal	
  encapsula9on	
  (closures)	
  

68	
  
Image	
  Convolu2on	
  
Componentff_im genStage(generate);
Componentff_im filterStage(filter);
for(int i = 0; iNIMGS; i++) {
r1 = genStage.callWorker(
new ff_im(images[i]));
results[i] = filterStage.callWorker(
new ff_im(r1));
}

Step%1:%Introduce%Components%
ff_farm gen_farm;
gen_farm.add_collector(NULL);
std::vectorff_node* gw;
for (int i=0; inworkers; i++)
gw.push_back(new gen_stage);
gen_farm.add_workers(gw);
ff_farm filter_farm;
filter_farm.add_collector(NULL);
std::vectorff_node* gw2;
for (int i=0; inworkers2; i++)
gw2.push_back(new CPU_Stage);
filter_farm2.add_workers(gw2);
StreamGen streamgen(NIMGS,images);
ff_pipeline pipe;
pipe.add_stage(streamgen);
pipe.add_stage(gen_farm);
pipe.add_stage(filter_farm);

Step%2:%Introduce%Pipeline%
ff_pipeline pipe;
StreamGen streamgen(NIMGS,images);
pipe.add_stage(streamgen);
pipe.add_stage(new genStage);
pipe.add_stage(new filterStage);
pipe.run_and_wait_end();

ff_farm gen_farm;
gen_farm.add_collector(NULL);
std::vectorff_node* gw;
for (int i=0; inworkers; i++)
gw.push_back(new gen_stage);
gen_farm.add_workers(gw);
ff_pipeline pipe;
StreamGen streamgen(NIMGS,images);
pipe.add_stage(streamgen);
pipe.add_stage(gen_farm);
pipe.add_stage(new filterStage);
pipe.run_and_wait_end();

pipe.run_and_wait_end();

Step%4:%Introduce%Farm%

Step%3:%Introduce%Farm%

69	
  
Refactoring	
  C++	
  in	
  Eclipse	
  

70	
  
Funded	
  by	
  
• 

ParaPhrase	
  (EU	
  FP7),	
  Pa:erns	
  for	
  heterogeneous	
  mul2core,	
  	
  
€4.2M,	
  2011-­‐2014	
  

	
  
• 

• 

SCIEnce	
  (EU	
  FP6),	
  Grid/Cloud/Mul2core	
  coordina2on	
  
•  €3.2M,	
  2005-­‐2012	
  
	
  
Advance	
  (EU	
  FP7),	
  Mul2core	
  streaming	
  
•  €2.7M,	
  2010-­‐2013	
  

• 

HPC-­‐GAP	
  (EPSRC),	
  Legacy	
  system	
  on	
  thousands	
  of	
  cores	
  
•  £1.6M,	
  2010-­‐2014	
  

• 

Islay	
  (EPSRC),	
  Real-­‐2me	
  FPGA	
  streaming	
  implementa2on	
  
•  £1.4M,	
  2008-­‐2011	
  

• 

TACLE:	
  European	
  Cost	
  Ac2on	
  on	
  Timing	
  Analysis	
  
•  €300K,	
  2012-­‐2015	
  

74	
  
Some	
  of	
  our	
  Industrial	
  Connec2ons	
  
Mellanox	
  Inc.	
  
Erlang	
  Solu9ons	
  Ltd	
  
SAP	
  GmbH,	
  Karlsrühe	
  
BAe	
  Systems	
  
Selex	
  Galileo	
  
BioId	
  GmbH,	
  Stubgart	
  
Philips	
  Healthcare	
  
Sosware	
  Competence	
  Centre,	
  Hagenberg	
  
Microsos	
  Research	
  
Well-­‐Typed	
  LLC	
  
	
  
75	
  
ParaPhrase	
  Needs	
  You!	
  
• 

Please	
  join	
  our	
  mailing	
  list	
  
and	
  help	
  grow	
  our	
  user	
  community	
  
§ 
§ 
§ 
§ 
§ 
§ 

• 

news	
  items	
  
access	
  to	
  free	
  development	
  sosware	
  
chat	
  to	
  the	
  developers	
  
free	
  developer	
  workshops	
  
bug	
  tracking	
  and	
  fixing	
  
Tools	
  for	
  both	
  Erlang	
  and	
  C++	
  

Subscribe	
  at	
  
hbps://mailman.cs.st-­‐andrews.ac.uk/mailman/	
  
lis9nfo/paraphrase-­‐news	
  

• 
• 

We’re	
  also	
  looking	
  for	
  open	
  source	
  
developers...	
  
We	
  also	
  have	
  8	
  PhD	
  studentships...	
  
76	
  
Further	
  Reading	
  
Chris	
  Brown.	
  Vladimir	
  Janjic,	
  Kevin	
  Hammond,	
  Mehdi	
  Goli	
  and	
  John	
  McCall	
  
“Bridging	
  the	
  Divide:	
  Intelligent	
  Mapping	
  for	
  the	
  Heterogeneous	
  Parallel	
  Programmer”,	
  
Submi?ed	
  to	
  IPDPS	
  2014	
  
Chris	
  Brown.	
  Marco	
  Danelu:o,	
  Kevin	
  Hammond,	
  Peter	
  Kilpatrick	
  and	
  Sam	
  Elliot	
  
“Cost-­‐Directed	
  Refactoring	
  for	
  Parallel	
  Erlang	
  Programs”	
  
To	
  appear	
  in	
  InternaGonal	
  Journal	
  of	
  Parallel	
  Programming,	
  2013	
  
Vladimir	
  Janjic,	
  Chris	
  Brown.	
  Max	
  Neunhoffer,	
  Kevin	
  Hammond,	
  Steve	
  Linton	
  and	
  Hans-­‐
Wolfgang	
  Loidl	
  
“Space	
  Explora2on	
  using	
  Parallel	
  Orbits”	
  
Proc.	
  PARCO	
  2013:	
  Interna2onal	
  Conf.	
  on	
  Parallel	
  Compu2ng,	
  Munich,	
  Sept.	
  2013	
  

Ask	
  me	
  for	
  copies!	
  
Chris	
  Brown.	
  Hans-­‐Wolfgang	
  Loidl	
  and	
  Kevin	
  Hammond	
  
Many	
  technical	
  
“ParaForming	
  Forming	
  Parallel	
  Haskell	
  Programs	
  using	
  
efactoring	
  Techniques”	
  
results	
  011	
  Trends	
  he	
   uncGonal	
  Programming	
  (TFP),	
  MNovel	
  Rpain,	
  May	
  2011	
  
also	
  on	
  t in	
  F
Proc.	
  	
  2
adrid,	
  S
project	
  web	
  site:	
  
Henrique	
   ownload!	
  
free	
  for	
  dFerreiro,	
  David	
  Castro,	
  Vladimir	
  Janjic	
  and	
  Kevin	
  Hammond	
  
“Repea2ng	
  History:	
  Execu2on	
  Replay	
  for	
  Parallel	
  Haskell	
  Programs”	
  
Proc.	
  2012	
  Trends	
  in	
  FuncGonal	
  Programming	
  (TFP),	
  St	
  Andrews,	
  UK,	
  June	
  2012	
  
In Preparation
THANK	
  YOU!	
  
http://www.paraphrase-ict.eu!
http://www.project-advance.eu!
	
  @paraphrase_fp7	
  

80	
  

Contenu connexe

En vedette

Programēšana (c++)
Programēšana (c++)Programēšana (c++)
Programēšana (c++)mikuskrisans
 
Dmt vacation
Dmt vacationDmt vacation
Dmt vacationmringlien
 
ćwiczenia dla dzieci lękliwych
ćwiczenia dla dzieci lękliwychćwiczenia dla dzieci lękliwych
ćwiczenia dla dzieci lękliwychŻaneta Kozubek
 
Charakterystyka nowej podstawy programowej zadanie i
Charakterystyka nowej podstawy programowej  zadanie iCharakterystyka nowej podstawy programowej  zadanie i
Charakterystyka nowej podstawy programowej zadanie iŻaneta Kozubek
 

En vedette (6)

Programēšana (c++)
Programēšana (c++)Programēšana (c++)
Programēšana (c++)
 
Two marks
Two marksTwo marks
Two marks
 
Dmt vacation
Dmt vacationDmt vacation
Dmt vacation
 
Malaria
MalariaMalaria
Malaria
 
ćwiczenia dla dzieci lękliwych
ćwiczenia dla dzieci lękliwychćwiczenia dla dzieci lękliwych
ćwiczenia dla dzieci lękliwych
 
Charakterystyka nowej podstawy programowej zadanie i
Charakterystyka nowej podstawy programowej  zadanie iCharakterystyka nowej podstawy programowej  zadanie i
Charakterystyka nowej podstawy programowej zadanie i
 

Similaire à ParaForming - Patterns and Refactoring for Parallel Programming

Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Docker, Inc.
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraMasaharu Munetomo
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learningArnaud Rachez
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at ScaleJeff Henrikson
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
The Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing LandscapeThe Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing Landscapeugur candan
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Spark Summit
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 

Similaire à ParaForming - Patterns and Refactoring for Parallel Programming (20)

Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
My parallel universe
My parallel universeMy parallel universe
My parallel universe
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Introducing Parallel Pixie Dust
Introducing Parallel Pixie DustIntroducing Parallel Pixie Dust
Introducing Parallel Pixie Dust
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
The Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing LandscapeThe Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing Landscape
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 

Dernier

mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Dernier (20)

mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

ParaForming - Patterns and Refactoring for Parallel Programming

  • 1. Paraforming:   Forming  Parallel  (Func2onal)  Programs  from   High-­‐Level  Pa:erns    using  Advanced   Refactoring   Kevin  Hammond,  Chris  Brown,  Vladimir  Janjic   University  of  St  Andrews,  Scotland   Build  Stuff,  Vilnius,  Lithuania,  December  10  2013   T:    @paraphrase_fp7,  @khstandrews   E:    kh@cs.st-­‐andrews.ac.uk   W: http://www.paraphrase-ict.eu!
  • 2. The  Present   Pound  versus  Dollar   3  
  • 3. The  Future:  “megacore”  computers?   §  Hundreds  of  thousands,  or  millions,  of  (small)  cores   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   5  
  • 4. What  will  “megacore”  computers  look   like?   §  Probably  not  just  scaled  versions  of  today’s  mul2core   §  §  §  §  §  §  Perhaps  hundreds  of  dedicated  lightweight  integer  units   Hundreds  of  floa9ng  point  units  (enhanced  GPU  designs)   A  few  heavyweight  general-­‐purpose  cores   Some  specialised  units  for  graphics,  authen9ca9on,  network  etc   possibly  so*  cores  (FPGAs  etc)   Highly  heterogeneous   6  
  • 5. What  will  “megacore”  computers  look   like?   §  Probably  not  uniform  shared  memory   §  NUMA  is  likely,  even  hardware  distributed  shared  memory   §  or  even  message-­‐passing  systems  on  a  chip   §  shared-­‐memory  will  not  be  a  good  abstrac:on   int  arr      [x]  [y];   7  
  • 6. Laki (NEC Nehalem Cluster) and hermit (XE6) Laki hermit (phase 1 step 1) 700 dual socket Xeon 5560 2,8GHz (“Gainestown”) 4/6 Nodes with 32GB and 64GB memory reflecting different user needs 2.7PB storage capacity @ 150GB/s IO bandwidth I Scientific Linux 6.0 Each compute node will have 2 sockets AMD Interlagos @ 2.3GHz 16 Cores each leading to 113.664 cores External Access Nodes, Pre- Postprocessing Nodes, Remote Visualization Nodes 32 nodes with additional Nvidia Tesla S1070 I 96 service nodes and 3552 compute nodes I Infiniband (QDR) I I 12 GB DDR3 RAM / node I 38 racks with 96 nodes each I I I I I :: HLRS in ParaPhrase :: Turin, 4th/5th October 2011 :: 8  
  • 7. The  Biggest  Computer  in  the  World   Tianhe-­‐2,  Chinese  Na2onal  University  of  Defence  Technology     33.86  petaflops/s  (June  17,  2013)   16,000  Nodes;  each  with  2  Ivy  Bridge  mul9cores  and  3  Xeon  Phis   3,120,000  x86  cores  in  total!!!   9  
  • 8. It’s  not  just  about  large  systems   •  Even  mobile  phones  are   mul9core   §  Samsung  Exynos  5  Octa  has  8  cores,  4  of   which  are  “dark”   •  Performance/energy  tradeoffs   mean  systems  will  be   increasingly  parallel   •  If  we  don’t  solve  the  mul9core   challenge,  then  no  other   advances  will  maber!   ALL  Future     Programming  will  be   Parallel!   10  
  • 9. The  Manycore  Challenge   “Ul9mately,  developers  should  start  thinking  about  tens,  hundreds,  and   thousands  of  cores  now  in  their  algorithmic  development  and  deployment   pipeline.”         Anwar  Ghuloum,  Principal  Engineer,  Intel  Microprocessor  Technology  Lab   The  ONLY  important  challenge  in  Computer  Science   (Intel)   “The  dilemma  is  that  a  large  percentage  of  mission-­‐cri9cal  enterprise  applica9ons   will  not  ``automagically''  run  faster  on  mul9-­‐core  servers.  In  fact,  many  will   actually  run  slower.  We  must  make  it  as  easy  as  possible  for  applica9ons     programmers  to  exploit  the  latest  developments  in  mul9-­‐core/many-­‐core   Also  recognised  as  thema9c  priori9es  by  EU  and   architectures,  while  s9ll  making  it  easy  to  target  future  (and  perhaps   na9onal  funding  bodies   unan9cipated)  hardware  developments.”     Patrick  Leonard,  Vice  President  for  Product  Development   Rogue  Wave  Sobware  
  • 10. But  Doesn’t  that  mean  millions  of   threads  on  a  megacore  machine??   13  
  • 11. How  to  build  a  wall   (with  apologies  to  Ian  Watson,  Univ.  Manchester)  
  • 12. How  to  build  a  wall  faster  
  • 13. How  NOT  to  build  a  wall   Typical  CONCURRENCY   Approaches  require  the   Programmer  to  solve  these   Task  iden2fica2on  is  not  the  only  problem…   Must  also  consider  Coordina9on,  communica9on,  placement,   scheduling,  …  
  • 14. We  need  structure   We  need  abstrac2on     We  don’t  need  another  brick  in  the  wall   17  
  • 15. Thinking  Parallel   §  Fundamentally,  programmers  must  learn  to  “think  parallel”   §  this  requires  new  high-­‐level  programming  constructs   §  perhaps  dealing  with  hundreds  of  millions  of  threads   §  You  cannot  program  effec2vely  while  worrying  about  deadlocks  etc.   §  they  must  be  eliminated  from  the  design!   §  You  cannot  program  effec2vely  while  fiddling  with  communica2on  etc.   §  this  needs  to  be  packaged/abstracted!   §  You  cannot  program  effec2vely  without  performance  informa2on   §  this  needs  to  be  included  as  part  of  the  design!   18  
  • 16. A  Solu2on?   “The  only  thing  that  works  for   parallelism  is  func2onal   programming”   Bob  Harper,  Carnegie  Mellon  University  
  • 17. Parallel  Func2onal  Programming   §  No  explicit  ordering  of  expressions   §  Purity  means  no  side-­‐effects   §  Impossible  for  parallel  processes  to  interfere  with  each  other   §  Can  debug  sequen2ally  but  run  in  parallel   §  Enormous  saving  in  effort   §  Programmer  concentrate  on  solving  the  problem   §  Not  por9ng  a  sequen9al  algorithm  into  a  (ill-­‐defined)  parallel  domain   §  No  locks,  deadlocks  or  race  condi2ons!!   §  Huge  produc2vity  gains!   λ   λ   λ  
  • 18. ParaPhrase  Project:  Parallel  Pa:erns  for  Heterogeneous  Mul2core  Systems   (ICT-­‐288570),    2011-­‐2014,  €4.2M  budget     13  Partners,  8  European  countries    UK,  Italy,  Germany,  Austria,  Ireland,  Hungary,  Poland,  Israel     Coordinated  by  Kevin  Hammond  St  Andrews   0  
  • 19. The  ParaPhrase  Approach   §  Start  bobom-­‐up   §  iden9fy    (strongly  hygienic)  COMPONENTS   §  using  semi-­‐automated  refactoring   both  legacy  and   new  programs   §  Think  about  the  PATTERN  of  parallelism   §  e.g.  map(reduce),  task  farm,  parallel  search,  parallel  comple9on,  ...   §  STRUCTURE  the  components  into  a  parallel  program   §  turn  the  pa?erns  into  concrete  (skeleton)  code   §  Take  performance,  energy  etc.  into  account  (mul9-­‐objec9ve  op9misa9on)   §  also  using  refactoring   §  RESTRUCTURE  if  necessary!  (also  using  refactoring)   25  
  • 20. Some  Common  Pa:erns   §  High-­‐level  abstract  paberns  of  common  parallel  algorithms   Google  map-­‐ reduce  combines   two  of  these!   Generally,  we  need   to  nest/combine   paberns  in  arbitray   ways   35  
  • 21. The  Skel  Library  for  Erlang   §  Skeletons  implement  specific  parallel  paberns   §  Pluggable  templates   §  Skel  is  a  new  (AND  ONLY!)  Skeleton  library  in  Erlang   §  map,  farm,  reduce,  pipeline,  feedback   §  instan9ated  using  skel:run   §  Fully  Nestable   chrisb.host.cs.st-­‐andrews.ac.uk/skel.html   hbps://github.com/ParaPhrase/skel   §  A  DSL  for  parallelism   ! OutputItems = skel:run(Skeleton, InputItems).! ! 36  
  • 22. e Parallel  Pipeline  Skeleton   §  Each  stage  of  the  pipeline  can  be  executed  in  parallel   §  The  input  and  output  are  streams   {pipe, [Skel1 , Skel2 , · · · , Skeln ]} Tn · · · T1 Skel1 Skel2 ··· Skeln Tn · · · T1 skel:run([{pipe,[Skel1, Skel2,..,SkelN]}], Inputs).! Inc = { seq , fun ( X ) - X +1 end } , ! Double = { seq , fun ( X ) - X *2 end } , skel : run ( { pipe , [ Inc , Double ] } ,   [ 1 ,2 ,3 ,4 ,5 ,6 ] ). 37  
  • 23. m Farm  Skeleton     §  Each  worker  is  executed  in  parallel   §  A  bit  like  a  1-­‐stage  pipeline   {farm, Skel, M} Skel1 Tn · · · T1   ! Skel2 . . . Tn · · · T1 SkelM skel:do([{farm, Skel, M}], Inputs).! nc = { seq , fun ( X ) - X +1 end } , 38  
  • 24. Using  The  Right  Pa:ern  Ma:ers   Speedup Speedups for Matrix Multiplication 24 22 20 18 16 14 12 10 8 6 4 2 Naive Parallel Farm Farm with Chunk 16 12 4 8 12 16 No. cores 20 24 39  
  • 25. The  ParaPhrase  Approach   Erlang   SequenGal   Code   Generic   Pa:ern  Library   Parallel   Code   Erlang   C/C++   Java   Haskell   Cos9ng/ Profiling   Refactoring   C/C++   Java   ...   ...   Haskell   Mellanox  Infiniband   Nvidia   Tesla   AMD   Opteron   AMD   Opteron   Intel   Core   Intel   Core   Nvidia   GPU   Nvidia   GPU   Intel   GPU   Intel   GPU   Intel   Xeon  Phi  
  • 26. Refactoring   §  Refactoring  changes  the   structure  of  the  source  code   §  using  well-­‐defined  rules   §  semi-­‐automa:cally  under   programmer  guidance                     Review
  • 27. S1Refactoring:  Farm  Introduc2on   S2 ⌘ P ipe(S1 , S2 ) pipe seq Map(S1 S2 , d, r) ⌘ Map(S1 , d, r) Map(S2 , d, r) map fission/fusion S ⌘ F arm(S) farm intro/elim Map(F, d, r) ⌘ P ipe(Decomp(d)), F arm(F ), Recomp(r)) data2stream 0 S1 ⌘ Map(S1 , d, r) map intro/elim Figure 3.3: Some Standard Skeleton Equivalences Farm   The following describes each of the patterns in turn: • a MAP is made up of three OPERATIONs: a worker, a partitioner, and a combiner, followed by an INPUT; • a SEQ is made up of a single OPERATION denoting the sequential computation to be performed, followed by an INPUT; • a FARM is made up of a single OPERATON denoting the working, an INPUT 44  
  • 28. Image  Processing  Example   Read  Image  1   Read  Image  2   {  Build            Stuff  }     “White   screening”   {  Build   Merge   Images   {  Build          Stuff  }   Write  Image   45  
  • 29. Basic  Erlang  Structure   [ writeImage(convertMerge(readImage(X))) ! ! ! ! ! !|| X - Images() ]! ! readImage({In1, in2, out) -! !…! !{ Image1, Image2, out}.! ! convertImage({Image1, Image2, out}) -! !Image1P = whiteScreen(Image1),! !Image2P = mergeImages(Image1, Image2),! !{Image2P, out}.! ! writeImage({Image, Out}) - …!     46  
  • 32. Speedup  Results  (Image  Processing)   Speedup Speedups for Haar Transform (Skel Task Farm) 24 22 20 18 16 14 12 10 8 6 4 2 1 1D Skel Task Farm 1D Skel Task Farm with Chunk Size = 4 2D Skel Task Farm 12 4 8 12 16 20 No. Farm Workers 24 50  
  • 33. Large-­‐Scale  Demonstrator  Applica2ons   §  ParaPhrase  tools  are  being  used  by  commercial/end-­‐user  partners   §  SCCH  (SME,  Austria)   §  Erlang  Solu9ons  Ltd  (SME,  UK)   §  Mellanox  (Israel)   §  ELTESos,  Hungary  (SME)   §  AGH  (University,  Poland)   §  HLRS  (High  Performance  Compu9ng  Centre,  Germany)  
  • 34. Speedup  Results  (demonstrators)   Speedup Speedups for Ant Colony, BasicN2 and Graphical Lasso 24 22 20 18 16 14 12 10 8 6 4 2 1 BasicN2 BasicN2 Manual Graphical Lasso Graphical Lasso Manual Ant Colony Optimisation Manual Ant Colony Optimisation Speedup  close  to   or  beHer  than   manual   op9misa9on   1 2 4 6 8 10 12 14 16 18 20 22 24 No of Workers 55  
  • 35. Bow2e2:  most  widely  used  DNA   alignment  tool   28 30 26 Speedup Speedup 24 22 20 25 20 18 16 15 Bt2FF-pin+int Bt2 14 20 30 40 50 60 70 80 Read Length 90 100 110 Bt2FF-pin+int Bt2 28 30 32 34 Quality 36 38 40 Original   Paraphrase   C.  Misale.  Accelera9ng  Bow9e2   with  a  lock-­‐less  concurrency   approach  and  memory  affinity.   IEEE  PDP  2014.  To  appear.   56  
  • 36. Comparison  of  Development  Times   ge pipeline (k), ates the images the images (F ). tained from the e first farm and o three workers es, and one for e load balancers e, the nature of second stage of first stage takes e takes around n a substantial Convolution Ant Colony BasicN2 Graphical Lasso Man.Time 3 days 1 day 5 days 15 hours Refac. Time 3 hours 1 hour 5 hours 2 hours LOC Intro. 58 32 40 53 Figure 3. Approximate manual implementation time of use-cases vs. refactoring time with lines of code introduced by refactoring tool linear scaling for higher numbers of cores, because of cache synchronisation (disjunct but interleaving memory regions are updated in the tasks), and an uneven size combined with a limited number of tasks (48). At the end of the computation, 58   some cores will wait idly for the completion of remaining
  • 37. Heterogeneous  Parallel  Programming   Profile# Informa'on* 1.#Iden(fy* Applica'on* Structured*Code* Ini'al*Structure* …* Int*main*()*…* For*(int*I*=0;*I**N;*i++)** **f*(*g*(x));* …* Config.*1* 2.#Enumerate## Skeleton* Configura'ons* Config.*2* 3.#Filter# Using*Cost* Model* Pipeline* 4.*Apply*MCTS* …* …* Op'mal*Parallel**Configura'on* With*Mappings# Refactorer* with*Mappings# CPU* 7.#Execute# …* Int*main*()*…* Farm1*=*Farm(f,*8,*2);* Pipe(farm1,*GPU(g));* …* * GPU* Config.*2(a)* 5.#Choose#Op'mal* Mapping/Configura'on* Heterogeneous*Machine# CPU* Config.*1(b)* Profile# Informa'on* GPU* CPU* Component* Component* CPU* Config.*2* Config.*3* Config.*1(a)* Farm* Config.*1* GPU* Refactorer* 6.#Refactor# Applica'on* CPU* CPU* CPU* [RGU  /   USTAN]  
  • 38. Example:  Enumerate  Skeleton  Configura2ons   for  Image  Convolu2on   Δ(r  p) r || Δ(p) Δ(r) p r p r || p Δ(r) Δ(p) r  :  read  image  file   p  :  process  image  file   r  Δ(p)
  • 39. Results  on  Benchmark:  Image  Convolu2on   MCTS  Mapping  (C,  G):        (6,  0)  ||  (0,  3)   Speedup  39.12     Best  Speedup:  40.91  
  • 40. Conclusions   §  The  manycore  revolu9on  is  upon  us   §  Computer  hardware  is  changing  very  rapidly   (more  than  in  the  last  50  years)   §  The  megacore  era  is  here  (aka  exascale,  BIG  data)   §  Heterogeneity  and  energy  are  both  important   §  Most  programming  models  are  too  low-­‐level   §  concurrency  based   §  need  to  expose  mass  parallelism   §  Paberns  and  func:onal  programming  help  with  abstrac9on   §  millions  of  threads,  easily  controlled  
  • 41. Conclusions  (2)   §  Func9onal  programming  makes  it  easy  to  introduce  parallelism   §  No  side  effects  means  any  computa9on  could  be  parallel   §  Matches  pabern-­‐based  parallelism   §  Much  detail  can  be  abstracted   §  Lots  of  problems  can  be  avoided   §  e.g.  Freedom  from  Deadlock   §  Parallel  programs  give  the  same  results  as  sequen9al  ones!   §  Automa9on  is  very  important   §  Refactoring  drama9cally  reduces  development  9me   (while  keeping  the  programmer  in  the  loop)   §  Machine  learning  is  very  promising  for  determining  complex  performance  sewngs      
  • 42. But  isn’t  this  all  just  wishful  thinking?   Rampant-­‐Lambda-­‐Men  in  St  Andrews   66  
  • 43. NO!   §  C++11  has  lambda  func9ons  (and  some  other  nice  func9onal-­‐ inspired  features)   §  Java  8  will  have  lambda  (closures)   §  Apple  uses  closures  in  Grand  Central  Dispatch   67  
  • 44. ParaPhrase  Parallel  C++  Refactoring   §  Integrated  into  Eclipse   §  Supports  full  C++(11)  standard   §  Uses  strongly  hygienic  components   §  func9onal  encapsula9on  (closures)   68  
  • 45. Image  Convolu2on   Componentff_im genStage(generate); Componentff_im filterStage(filter); for(int i = 0; iNIMGS; i++) { r1 = genStage.callWorker( new ff_im(images[i])); results[i] = filterStage.callWorker( new ff_im(r1)); } Step%1:%Introduce%Components% ff_farm gen_farm; gen_farm.add_collector(NULL); std::vectorff_node* gw; for (int i=0; inworkers; i++) gw.push_back(new gen_stage); gen_farm.add_workers(gw); ff_farm filter_farm; filter_farm.add_collector(NULL); std::vectorff_node* gw2; for (int i=0; inworkers2; i++) gw2.push_back(new CPU_Stage); filter_farm2.add_workers(gw2); StreamGen streamgen(NIMGS,images); ff_pipeline pipe; pipe.add_stage(streamgen); pipe.add_stage(gen_farm); pipe.add_stage(filter_farm); Step%2:%Introduce%Pipeline% ff_pipeline pipe; StreamGen streamgen(NIMGS,images); pipe.add_stage(streamgen); pipe.add_stage(new genStage); pipe.add_stage(new filterStage); pipe.run_and_wait_end(); ff_farm gen_farm; gen_farm.add_collector(NULL); std::vectorff_node* gw; for (int i=0; inworkers; i++) gw.push_back(new gen_stage); gen_farm.add_workers(gw); ff_pipeline pipe; StreamGen streamgen(NIMGS,images); pipe.add_stage(streamgen); pipe.add_stage(gen_farm); pipe.add_stage(new filterStage); pipe.run_and_wait_end(); pipe.run_and_wait_end(); Step%4:%Introduce%Farm% Step%3:%Introduce%Farm% 69  
  • 46. Refactoring  C++  in  Eclipse   70  
  • 47.
  • 48. Funded  by   •  ParaPhrase  (EU  FP7),  Pa:erns  for  heterogeneous  mul2core,     €4.2M,  2011-­‐2014     •  •  SCIEnce  (EU  FP6),  Grid/Cloud/Mul2core  coordina2on   •  €3.2M,  2005-­‐2012     Advance  (EU  FP7),  Mul2core  streaming   •  €2.7M,  2010-­‐2013   •  HPC-­‐GAP  (EPSRC),  Legacy  system  on  thousands  of  cores   •  £1.6M,  2010-­‐2014   •  Islay  (EPSRC),  Real-­‐2me  FPGA  streaming  implementa2on   •  £1.4M,  2008-­‐2011   •  TACLE:  European  Cost  Ac2on  on  Timing  Analysis   •  €300K,  2012-­‐2015   74  
  • 49. Some  of  our  Industrial  Connec2ons   Mellanox  Inc.   Erlang  Solu9ons  Ltd   SAP  GmbH,  Karlsrühe   BAe  Systems   Selex  Galileo   BioId  GmbH,  Stubgart   Philips  Healthcare   Sosware  Competence  Centre,  Hagenberg   Microsos  Research   Well-­‐Typed  LLC     75  
  • 50. ParaPhrase  Needs  You!   •  Please  join  our  mailing  list   and  help  grow  our  user  community   §  §  §  §  §  §  •  news  items   access  to  free  development  sosware   chat  to  the  developers   free  developer  workshops   bug  tracking  and  fixing   Tools  for  both  Erlang  and  C++   Subscribe  at   hbps://mailman.cs.st-­‐andrews.ac.uk/mailman/   lis9nfo/paraphrase-­‐news   •  •  We’re  also  looking  for  open  source   developers...   We  also  have  8  PhD  studentships...   76  
  • 51. Further  Reading   Chris  Brown.  Vladimir  Janjic,  Kevin  Hammond,  Mehdi  Goli  and  John  McCall   “Bridging  the  Divide:  Intelligent  Mapping  for  the  Heterogeneous  Parallel  Programmer”,   Submi?ed  to  IPDPS  2014   Chris  Brown.  Marco  Danelu:o,  Kevin  Hammond,  Peter  Kilpatrick  and  Sam  Elliot   “Cost-­‐Directed  Refactoring  for  Parallel  Erlang  Programs”   To  appear  in  InternaGonal  Journal  of  Parallel  Programming,  2013   Vladimir  Janjic,  Chris  Brown.  Max  Neunhoffer,  Kevin  Hammond,  Steve  Linton  and  Hans-­‐ Wolfgang  Loidl   “Space  Explora2on  using  Parallel  Orbits”   Proc.  PARCO  2013:  Interna2onal  Conf.  on  Parallel  Compu2ng,  Munich,  Sept.  2013   Ask  me  for  copies!   Chris  Brown.  Hans-­‐Wolfgang  Loidl  and  Kevin  Hammond   Many  technical   “ParaForming  Forming  Parallel  Haskell  Programs  using   efactoring  Techniques”   results  011  Trends  he   uncGonal  Programming  (TFP),  MNovel  Rpain,  May  2011   also  on  t in  F Proc.    2 adrid,  S project  web  site:   Henrique   ownload!   free  for  dFerreiro,  David  Castro,  Vladimir  Janjic  and  Kevin  Hammond   “Repea2ng  History:  Execu2on  Replay  for  Parallel  Haskell  Programs”   Proc.  2012  Trends  in  FuncGonal  Programming  (TFP),  St  Andrews,  UK,  June  2012  
  • 52.