SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
CHARACTERIZING	
  APU	
  PERFORMANCE	
  IN	
  HADOOPCL	
  
ON	
  HETEROGENEOUS	
  DISTRIBUTED	
  PLATFORMS	
  
MAX	
  GROSSMAN,	
  MAURICIO	
  BRETERNITZ,	
  AND	
  VIVEK	
  SARKAR	
  
RICE	
  UNIVERSITY	
  &	
  AMD	
  
MOTIVATION	
  
! Cloud	
  offers	
  elasHcity,	
  lowered	
  startup	
  costs,	
  unified	
  plaQorm	
  for	
  all	
  
! Generally	
  see	
  worse	
  and	
  less	
  predictable	
  performance	
  
‒ Noisy	
  neighbor	
  

! Economics	
  of	
  scale	
  =>	
  cloud	
  is	
  here	
  to	
  stay	
  
	
  
“I	
  don’t	
  care	
  where	
  my	
  code	
  runs,	
  as	
  long	
  
as	
  it	
  finishes…	
  someday”	
  –	
  Bob	
  the	
  Cloud	
  
User	
  

2	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
STATE-­‐OF-­‐THE-­‐ART	
  
! Hadoop	
  
‒ Java	
  programming	
  language	
  
‒ JDK	
  libraries	
  
‒ Arbitrary	
  data	
  types	
  
‒ Reliability	
  
‒ Simple	
  MapReduce	
  distributed	
  
programming	
  model	
  

!  AbstracHons	
  built	
  on	
  Hadoop	
  
‒ H2O	
  from	
  0xdata	
  
‒ Mahout	
  machine	
  learning	
  framework	
  

3	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PROBLEMS	
  
1.  Poor	
  computaHonal	
  performance	
  
‒  JVM	
  execuHon,	
  short-­‐lived	
  tasks	
  implies	
  poor	
  JIT,	
  
high	
  startup	
  cost	
  for	
  creaHng	
  child	
  processes	
  

2.  Poor	
  I/O	
  performance	
  
‒  SerializaHon,	
  deserializaHon	
  of	
  arbitrary	
  data	
  types	
  

3.  Manual	
  tweaking	
  of	
  intertwined	
  tunables	
  
‒  In	
  an	
  unstable	
  cloud	
  environment,	
  you	
  never	
  have	
  
it	
  right	
  

4.  Scheduling	
  execuHon	
  &	
  communicaHon	
  with	
  a	
  
holisHc	
  view	
  of	
  the	
  plaQorm	
  

4	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

" A	
  small	
  sampling	
  of	
  Hadoop	
  tunables…	
  
A	
  POTENTIAL	
  SOLUTION	
  
!  OpenCL	
  
‒ SIMD	
  programming	
  model	
  
‒ MulH-­‐architecture	
  and	
  mulH-­‐vendor	
  support	
  
‒ APIs	
  for	
  launching	
  compute	
  and	
  copy	
  tasks	
  

!  An	
  expert	
  programmer	
  could:	
  
1. 
2. 
3. 
4. 

Translate	
  all	
  applicaHon	
  code	
  to	
  OpenCL	
  kernels	
  
Compile	
  OpenCL	
  kernels,	
  API	
  calls	
  into	
  naHve	
  library	
  
Call	
  naHve	
  library	
  from	
  Java	
  via	
  JNI	
  
Spend	
  a	
  lot	
  of	
  Hme	
  debugging	
  performance	
  and	
  
correctness	
  

! SHll	
  not	
  good	
  enough!	
  

5	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Host	
  

Host	
  
ApplicaHon	
  

Device	
  
clEnqueueNDRange()	
  
Hadoop	
  

	
  
Reliability	
  
Distributed	
  PlaQorm	
  

APARAPI	
  
	
  
bytecode	
  to	
  
OpenCL	
  
kernels	
  

OpenCL	
  

	
  
MulH-­‐architecture	
  execuHon	
  
in	
  naHve	
  threads	
  
	
  
6	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

!  Hardware	
  aware	
  plaQorm	
  manager	
  
!  Machine-­‐learning,	
  mulH-­‐device	
  scheduler	
  
based	
  on	
  device	
  occupancy	
  and	
  past	
  
kernel	
  performance	
  
!  Architecture	
  aware	
  opHmizing	
  compiler	
  
!  Hadoop-­‐like	
  API	
  
HADOOPCL	
  ARCHITECTURE	
  
	
  
class	
  PiMapper	
  extends	
  
	
  	
  	
  	
  DoubleDoubleBoolIntHadoopCLMapper	
  {	
  
	
  
	
  	
  public	
  void	
  map(double	
  x,	
  	
  
	
  	
  	
  	
  	
  	
  double	
  y)	
  {	
  
	
  	
  	
  	
  if(x	
  *	
  x	
  +	
  y	
  *	
  y	
  >	
  0.25)	
  {	
  
	
  	
  	
  	
  	
  	
  write(false,	
  1);	
  
	
  	
  	
  	
  }	
  else	
  {	
  
	
  	
  	
  	
  	
  	
  write(true,	
  1);	
  
	
  	
  	
  	
  }	
  
	
  	
  }	
  
}	
  
	
  
job.waitForCompletion(true);	
  

7	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

javac	
  

.class	
  

!  HadoopCL	
  programming	
  model	
  supports	
  
‒  Java	
  syntax	
  
‒  MapReduce	
  abstracHons	
  
‒  Dynamic	
  memory	
  allocaHon	
  
‒  Variety	
  of	
  data	
  types	
  (primiHves,	
  sparse	
  vectors,	
  tuples,	
  
etc)	
  and	
  can	
  be	
  extended	
  to	
  more	
  
‒  Constant	
  globals	
  accessible	
  from	
  anywhere	
  

!  HadoopCL	
  does	
  not	
  support	
  
‒  Arbitrary	
  inputs,	
  outputs	
  
‒  Massive	
  data	
  elements	
  (i.e.	
  sparse	
  vectors	
  larger	
  than	
  
device	
  memory)	
  
‒  Object	
  references	
  
HADOOPCL	
  ARCHITECTURE	
  
$	
  hadoop	
  jar	
  Pi.jar	
  input	
  output	
  

NameNode	
  +	
  
JobTracker	
  

DataNode	
  

DataNode	
  

8	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Hadoop	
  DataNode	
  

Task	
  

Map	
  or	
  Reduce	
  

HadoopCL	
  
Child	
  

TaskTracker	
  
HadoopCL	
  ML	
  Device	
  
Scheduler	
  

HadoopCL	
  
Child	
  

HadoopCL	
  
Child	
  

HadoopCL	
  
Child	
  
HADOOPCL	
  ARCHITECTURE	
  

Task	
  

Map	
  or	
  Reduce	
  

‒  Data	
  is	
  buffered	
  in	
  chunks	
  for	
  
processing	
  on	
  the	
  OpenCL	
  device	
  

!  HadoopCL	
  explicitly	
  manages	
  buffers	
  
to	
  prevent	
  large	
  GC	
  overheads	
  
!  Kernel	
  Executor	
  handles	
  
‒  Auto-­‐generaHon	
  and	
  opHmizaHon	
  of	
  
OpenCL	
  kernels	
  from	
  JVM	
  bytecode	
  
‒  Transfer	
  of	
  inputs,	
  outputs	
  to	
  device	
  
‒  Asynchronous	
  launch	
  of	
  OpenCL	
  
kernels	
  

9	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Input	
  
Buffer	
  
Queue	
  

Launch	
  
Retry	
  

OpenCL	
  
Device	
   Output	
   Output	
  
Buffer	
  
Kernel	
  
Queue	
  	
  
Executor	
  

Input	
  
Collector	
  
Input	
  
Buffer	
  

Rele
ase	
  

!  Each	
  Child	
  JVM	
  encloses	
  a	
  data-­‐
driven	
  pipeline	
  of	
  
communicaHon	
  and	
  computaHon	
  
tasks	
  

HadoopCL	
  Child	
  

Input	
  
Buffer	
  
Manager	
  

Output	
  
Buffer	
  
Manager	
  
TOPICS	
  IN	
  HADOOPCL	
  
!  Extending	
  APARAPI	
  with	
  architecture-­‐	
  and	
  data-­‐aware	
  compiler	
  opHmizaHons	
  
1.  A	
  number	
  of	
  HadoopCL-­‐specific	
  funcHons	
  are	
  auto-­‐generated	
  from	
  APARAPI	
  at	
  runHme	
  
2.  When	
  GPU	
  execuHon	
  is	
  detected	
  and	
  a	
  vector	
  data-­‐type	
  is	
  in	
  use,	
  the	
  HadoopCL	
  runHme	
  
auto-­‐strides	
  input	
  vectors	
  before	
  copying	
  to	
  the	
  device	
  
‒ 

APARAPI	
  must	
  emit	
  strided	
  code	
  to	
  match	
  data	
  layout,	
  fails	
  in	
  certain	
  cases	
  

double	
  MahoutKMeansMapper__dot(...){	
  
	
  	
  double	
  agg	
  =	
  0.0;	
  
	
  	
  for	
  (int	
  i	
  =	
  0;	
  i	
  <	
  length1;	
  i++){	
  
	
  	
  	
  	
  int	
  currentIndex	
  =	
  index1[(i)	
  *	
  this-­‐>nPairs];	
  
	
  	
  	
  	
  int	
  j	
  =	
  0;	
  
	
  	
  	
  	
  for	
  (;	
  j<length2	
  &&	
  currentIndex!=index2[j];	
  j++)	
  ;	
  
	
  	
  	
  	
  if	
  (j	
  !=	
  length2)	
  
	
  	
  	
  	
  	
  	
  agg	
  =	
  agg	
  +	
  (val1[(i)	
  *	
  this-­‐>nPairs]	
  *	
  val2[j]);	
  
	
  	
  }	
  
	
  	
  return(agg);	
  
}	
  

10	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

double	
  MahoutKMeansMapper__dot(...){	
  
	
  	
  double	
  agg	
  =	
  0.0;	
  
	
  	
  for	
  (int	
  i	
  =	
  0;	
  i	
  <	
  length1;	
  i++){	
  
	
  	
  	
  	
  int	
  currentIndex	
  =	
  index1[i];	
  
	
  	
  	
  	
  int	
  j	
  =	
  0;	
  
	
  	
  	
  	
  for	
  (;	
  j<length2	
  &&	
  currentIndex!=index2[j];	
  j++)	
  ;	
  
	
  	
  	
  	
  if	
  (j!=length2)	
  
	
  	
  	
  	
  	
  	
  agg	
  =	
  agg	
  +	
  (val1[i]	
  *	
  val2[j]);	
  
	
  	
  }	
  
	
  	
  return(agg);	
  
}	
  
TOPICS	
  IN	
  HADOOPCL	
  
!  Enabling	
  OpenCL	
  dynamic	
  memory	
  allocaHon	
  through	
  restart-­‐able	
  kernels	
  
‒ Note:	
  there	
  are	
  no	
  side	
  effects	
  of	
  mappers	
  or	
  reducers	
  unHl	
  they	
  commit	
  (i.e.	
  write())	
  

OpenCL	
  Device	
  
Heap	
  

public	
  void	
  map(int	
  key,	
  double	
  val)	
  {	
  
	
  	
  int[]	
  outputVec	
  =	
  new	
  int[10];	
  
	
  	
  ...	
  
	
  	
  write(key,	
  outputVec);	
  
}
	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  Mapper.java	
  

free	
  

nWrites	
  
nInputs	
  

writeOffsetLookup	
  

11	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

__kernel	
  void	
  map(int	
  key,	
  double	
  val)	
  {	
  
	
  	
  int	
  oldOffset	
  =	
  atomic_add(free,	
  10);	
  
	
  	
  if	
  (oldOffset	
  +	
  10	
  >=	
  heapSize)	
  {	
  
	
  	
  	
  	
  nWrites[inputIndex]	
  =	
  -­‐1;	
  
	
  	
  	
  	
  return;	
  
	
  	
  }	
  
	
  	
  ...	
  
	
  	
  writeOffsetLookup[inputIndex]	
  =	
  oldOffset;	
  
	
  	
  nWrites[inputIndex]	
  =	
  nWrites[inputIndex]	
  +	
  1;	
  
}
	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  Mapper.cl	
  
TOPICS	
  IN	
  HADOOPCL	
  
!  Auto-­‐scheduling	
  OpenCL	
  kernels	
  across	
  execuHon	
  plaQorms	
  through	
  machine	
  learning	
  
‒ HadoopCL	
  TaskTracker	
  is	
  responsible	
  for	
  
1.  Assigning	
  each	
  Task	
  an	
  execuHon	
  plaQorm	
  (GPU,	
  CPU,	
  or	
  JVM)	
  
2.  Recording	
  execuHon	
  Hme	
  for	
  each	
  task	
  along	
  with	
  the	
  kernel	
  executed	
  and	
  average	
  device	
  
occupancy	
  during	
  that	
  task’s	
  execuHon	
  

!  Device	
  assignment	
  is	
  based	
  on	
  programmer	
  hints	
  and/or	
  recorded	
  data	
  from	
  previous	
  
runs	
  
‒  Data	
  is	
  recorded	
  in	
  files	
  to	
  be	
  used	
  across	
  Jobs	
  

12	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
EVALUATION	
  
!  Mahout	
  Kmeans	
  
‒ Mahout	
  provides	
  Hadoop	
  MapReduce	
  
implementaHons	
  of	
  a	
  variety	
  of	
  ML	
  algorithms	
  
‒ KMeans	
  iteraHvely	
  searches	
  for	
  K	
  clusters	
  

!  HadoopCL	
  KMeans	
  port	
  
‒ Mapper	
  is	
  trivial,	
  for	
  each	
  point	
  iterates	
  through	
  
all	
  clusters	
  and	
  outputs	
  the	
  closest	
  
‒ Reducer	
  is	
  more	
  complex	
  
‒ Both	
  OpenCL	
  and	
  Java	
  versions	
  implemented,	
  as	
  
HadoopCL	
  allows	
  the	
  programmer	
  to	
  force	
  JVM	
  
execuHon	
  

13	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
EVALUATION	
  
!  Evaluated	
  on	
  a	
  10-­‐node	
  AMD	
  APU	
  cluster	
  
!  Two	
  datasets	
  with	
  varying	
  parameters	
  tested	
  
‒ Wiki	
  data	
  set	
  
‒ ASF	
  e-­‐mail	
  archives	
  data	
  set	
  
‒ Varied	
  K,	
  the	
  number	
  of	
  clusters	
  
‒ Varied	
  the	
  type	
  of	
  pruning	
  done	
  on	
  the	
  input	
  data	
  
(prune	
  all	
  but	
  the	
  N	
  most	
  frequent	
  tokens	
  vs.	
  prune	
  
each	
  vector	
  to	
  be	
  at	
  most	
  length	
  M)	
  
‒ Varied	
  the	
  amount	
  of	
  pruning	
  done	
  (i.e.	
  the	
  values	
  of	
  
N	
  and	
  M)	
  
‒ Enable	
  and	
  disable	
  HadoopCL	
  features	
  to	
  observe	
  
impact	
  on	
  performance	
  

14	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
EVALUATION	
  
!  Graphs	
  here	
  

15	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
CONCLUSION	
  
!  HadoopCL	
  offers	
  the	
  flexibility,	
  reliability,	
  and	
  
programmability	
  of	
  Hadoop	
  accelerated	
  by	
  naHve,	
  
heterogeneous	
  OpenCL	
  threads	
  
!  Using	
  HadoopCL	
  is	
  a	
  tradeoff:	
  lose	
  parts	
  of	
  the	
  Java	
  
language	
  but	
  gain	
  improved	
  performance	
  
!  EvaluaHon	
  of	
  KMeans	
  with	
  real-­‐world	
  data	
  sets	
  shows	
  
that	
  HadoopCL	
  is	
  flexible	
  and	
  efficient	
  enough	
  to	
  
improve	
  performance	
  of	
  real-­‐world	
  applicaHons	
  
!  Future	
  work	
  to	
  target	
  HSA	
  instead	
  of	
  OpenCL	
  
	
  
	
  
Max	
  Grossman,	
  jmg3@rice.edu	
  
16	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
DISCLAIMER	
  &	
  ATTRIBUTION	
  

The	
  informaHon	
  presented	
  in	
  this	
  document	
  is	
  for	
  informaHonal	
  purposes	
  only	
  and	
  may	
  contain	
  technical	
  inaccuracies,	
  omissions	
  and	
  typographical	
  errors.	
  
	
  
The	
  informaHon	
  contained	
  herein	
  is	
  subject	
  to	
  change	
  and	
  may	
  be	
  rendered	
  inaccurate	
  for	
  many	
  reasons,	
  including	
  but	
  not	
  limited	
  to	
  product	
  and	
  roadmap	
  
changes,	
  component	
  and	
  motherboard	
  version	
  changes,	
  new	
  model	
  and/or	
  product	
  releases,	
  product	
  differences	
  between	
  differing	
  manufacturers,	
  souware	
  
changes,	
  BIOS	
  flashes,	
  firmware	
  upgrades,	
  or	
  the	
  like.	
  AMD	
  assumes	
  no	
  obligaHon	
  to	
  update	
  or	
  otherwise	
  correct	
  or	
  revise	
  this	
  informaHon.	
  However,	
  AMD	
  
reserves	
  the	
  right	
  to	
  revise	
  this	
  informaHon	
  and	
  to	
  make	
  changes	
  from	
  Hme	
  to	
  Hme	
  to	
  the	
  content	
  hereof	
  without	
  obligaHon	
  of	
  AMD	
  to	
  noHfy	
  any	
  person	
  of	
  
such	
  revisions	
  or	
  changes.	
  
	
  
AMD	
  MAKES	
  NO	
  REPRESENTATIONS	
  OR	
  WARRANTIES	
  WITH	
  RESPECT	
  TO	
  THE	
  CONTENTS	
  HEREOF	
  AND	
  ASSUMES	
  NO	
  RESPONSIBILITY	
  FOR	
  ANY	
  
INACCURACIES,	
  ERRORS	
  OR	
  OMISSIONS	
  THAT	
  MAY	
  APPEAR	
  IN	
  THIS	
  INFORMATION.	
  
	
  
AMD	
  SPECIFICALLY	
  DISCLAIMS	
  ANY	
  IMPLIED	
  WARRANTIES	
  OF	
  MERCHANTABILITY	
  OR	
  FITNESS	
  FOR	
  ANY	
  PARTICULAR	
  PURPOSE.	
  IN	
  NO	
  EVENT	
  WILL	
  AMD	
  BE	
  
LIABLE	
  TO	
  ANY	
  PERSON	
  FOR	
  ANY	
  DIRECT,	
  INDIRECT,	
  SPECIAL	
  OR	
  OTHER	
  CONSEQUENTIAL	
  DAMAGES	
  ARISING	
  FROM	
  THE	
  USE	
  OF	
  ANY	
  INFORMATION	
  
CONTAINED	
  HEREIN,	
  EVEN	
  IF	
  AMD	
  IS	
  EXPRESSLY	
  ADVISED	
  OF	
  THE	
  POSSIBILITY	
  OF	
  SUCH	
  DAMAGES.	
  
	
  
ATTRIBUTION	
  
©	
  2013	
  Advanced	
  Micro	
  Devices,	
  Inc.	
  All	
  rights	
  reserved.	
  AMD,	
  the	
  AMD	
  Arrow	
  logo	
  and	
  combinaHons	
  thereof	
  are	
  trademarks	
  of	
  Advanced	
  Micro	
  Devices,	
  
Inc.	
  in	
  the	
  United	
  States	
  and/or	
  other	
  jurisdicHons.	
  	
  SPEC	
  	
  is	
  a	
  registered	
  trademark	
  of	
  the	
  Standard	
  Performance	
  EvaluaHon	
  CorporaHon	
  (SPEC).	
  Other	
  
names	
  are	
  for	
  informaHonal	
  purposes	
  only	
  and	
  may	
  be	
  trademarks	
  of	
  their	
  respecHve	
  owners.	
  
17	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
SAMPLE	
  SHAPES	
  

18	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  21,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman

Contenu connexe

Tendances

PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...AMD Developer Central
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandAMD Developer Central
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauAMD Developer Central
 

Tendances (20)

PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-Bilodeau
 

Similaire à CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman

A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemAdarsh Pannu
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGATO project
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkManish Gupta
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsightKhalid Salama
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleAMD Developer Central
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEnis Afgan
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containerspranav_joshi
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 

Similaire à CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman (20)

A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating System
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
20140708hcj
20140708hcj20140708hcj
20140708hcj
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsight
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 

Plus de AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 

Plus de AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 

Dernier

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman

  • 1. CHARACTERIZING  APU  PERFORMANCE  IN  HADOOPCL   ON  HETEROGENEOUS  DISTRIBUTED  PLATFORMS   MAX  GROSSMAN,  MAURICIO  BRETERNITZ,  AND  VIVEK  SARKAR   RICE  UNIVERSITY  &  AMD  
  • 2. MOTIVATION   ! Cloud  offers  elasHcity,  lowered  startup  costs,  unified  plaQorm  for  all   ! Generally  see  worse  and  less  predictable  performance   ‒ Noisy  neighbor   ! Economics  of  scale  =>  cloud  is  here  to  stay     “I  don’t  care  where  my  code  runs,  as  long   as  it  finishes…  someday”  –  Bob  the  Cloud   User   2   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 3. STATE-­‐OF-­‐THE-­‐ART   ! Hadoop   ‒ Java  programming  language   ‒ JDK  libraries   ‒ Arbitrary  data  types   ‒ Reliability   ‒ Simple  MapReduce  distributed   programming  model   !  AbstracHons  built  on  Hadoop   ‒ H2O  from  0xdata   ‒ Mahout  machine  learning  framework   3   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 4. PROBLEMS   1.  Poor  computaHonal  performance   ‒  JVM  execuHon,  short-­‐lived  tasks  implies  poor  JIT,   high  startup  cost  for  creaHng  child  processes   2.  Poor  I/O  performance   ‒  SerializaHon,  deserializaHon  of  arbitrary  data  types   3.  Manual  tweaking  of  intertwined  tunables   ‒  In  an  unstable  cloud  environment,  you  never  have   it  right   4.  Scheduling  execuHon  &  communicaHon  with  a   holisHc  view  of  the  plaQorm   4   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   " A  small  sampling  of  Hadoop  tunables…  
  • 5. A  POTENTIAL  SOLUTION   !  OpenCL   ‒ SIMD  programming  model   ‒ MulH-­‐architecture  and  mulH-­‐vendor  support   ‒ APIs  for  launching  compute  and  copy  tasks   !  An  expert  programmer  could:   1.  2.  3.  4.  Translate  all  applicaHon  code  to  OpenCL  kernels   Compile  OpenCL  kernels,  API  calls  into  naHve  library   Call  naHve  library  from  Java  via  JNI   Spend  a  lot  of  Hme  debugging  performance  and   correctness   ! SHll  not  good  enough!   5   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   Host   Host   ApplicaHon   Device   clEnqueueNDRange()  
  • 6. Hadoop     Reliability   Distributed  PlaQorm   APARAPI     bytecode  to   OpenCL   kernels   OpenCL     MulH-­‐architecture  execuHon   in  naHve  threads     6   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   !  Hardware  aware  plaQorm  manager   !  Machine-­‐learning,  mulH-­‐device  scheduler   based  on  device  occupancy  and  past   kernel  performance   !  Architecture  aware  opHmizing  compiler   !  Hadoop-­‐like  API  
  • 7. HADOOPCL  ARCHITECTURE     class  PiMapper  extends          DoubleDoubleBoolIntHadoopCLMapper  {        public  void  map(double  x,                double  y)  {          if(x  *  x  +  y  *  y  >  0.25)  {              write(false,  1);          }  else  {              write(true,  1);          }      }   }     job.waitForCompletion(true);   7   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   javac   .class   !  HadoopCL  programming  model  supports   ‒  Java  syntax   ‒  MapReduce  abstracHons   ‒  Dynamic  memory  allocaHon   ‒  Variety  of  data  types  (primiHves,  sparse  vectors,  tuples,   etc)  and  can  be  extended  to  more   ‒  Constant  globals  accessible  from  anywhere   !  HadoopCL  does  not  support   ‒  Arbitrary  inputs,  outputs   ‒  Massive  data  elements  (i.e.  sparse  vectors  larger  than   device  memory)   ‒  Object  references  
  • 8. HADOOPCL  ARCHITECTURE   $  hadoop  jar  Pi.jar  input  output   NameNode  +   JobTracker   DataNode   DataNode   8   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   Hadoop  DataNode   Task   Map  or  Reduce   HadoopCL   Child   TaskTracker   HadoopCL  ML  Device   Scheduler   HadoopCL   Child   HadoopCL   Child   HadoopCL   Child  
  • 9. HADOOPCL  ARCHITECTURE   Task   Map  or  Reduce   ‒  Data  is  buffered  in  chunks  for   processing  on  the  OpenCL  device   !  HadoopCL  explicitly  manages  buffers   to  prevent  large  GC  overheads   !  Kernel  Executor  handles   ‒  Auto-­‐generaHon  and  opHmizaHon  of   OpenCL  kernels  from  JVM  bytecode   ‒  Transfer  of  inputs,  outputs  to  device   ‒  Asynchronous  launch  of  OpenCL   kernels   9   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   Input   Buffer   Queue   Launch   Retry   OpenCL   Device   Output   Output   Buffer   Kernel   Queue     Executor   Input   Collector   Input   Buffer   Rele ase   !  Each  Child  JVM  encloses  a  data-­‐ driven  pipeline  of   communicaHon  and  computaHon   tasks   HadoopCL  Child   Input   Buffer   Manager   Output   Buffer   Manager  
  • 10. TOPICS  IN  HADOOPCL   !  Extending  APARAPI  with  architecture-­‐  and  data-­‐aware  compiler  opHmizaHons   1.  A  number  of  HadoopCL-­‐specific  funcHons  are  auto-­‐generated  from  APARAPI  at  runHme   2.  When  GPU  execuHon  is  detected  and  a  vector  data-­‐type  is  in  use,  the  HadoopCL  runHme   auto-­‐strides  input  vectors  before  copying  to  the  device   ‒  APARAPI  must  emit  strided  code  to  match  data  layout,  fails  in  certain  cases   double  MahoutKMeansMapper__dot(...){      double  agg  =  0.0;      for  (int  i  =  0;  i  <  length1;  i++){          int  currentIndex  =  index1[(i)  *  this-­‐>nPairs];          int  j  =  0;          for  (;  j<length2  &&  currentIndex!=index2[j];  j++)  ;          if  (j  !=  length2)              agg  =  agg  +  (val1[(i)  *  this-­‐>nPairs]  *  val2[j]);      }      return(agg);   }   10   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   double  MahoutKMeansMapper__dot(...){      double  agg  =  0.0;      for  (int  i  =  0;  i  <  length1;  i++){          int  currentIndex  =  index1[i];          int  j  =  0;          for  (;  j<length2  &&  currentIndex!=index2[j];  j++)  ;          if  (j!=length2)              agg  =  agg  +  (val1[i]  *  val2[j]);      }      return(agg);   }  
  • 11. TOPICS  IN  HADOOPCL   !  Enabling  OpenCL  dynamic  memory  allocaHon  through  restart-­‐able  kernels   ‒ Note:  there  are  no  side  effects  of  mappers  or  reducers  unHl  they  commit  (i.e.  write())   OpenCL  Device   Heap   public  void  map(int  key,  double  val)  {      int[]  outputVec  =  new  int[10];      ...      write(key,  outputVec);   }                  Mapper.java   free   nWrites   nInputs   writeOffsetLookup   11   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL   __kernel  void  map(int  key,  double  val)  {      int  oldOffset  =  atomic_add(free,  10);      if  (oldOffset  +  10  >=  heapSize)  {          nWrites[inputIndex]  =  -­‐1;          return;      }      ...      writeOffsetLookup[inputIndex]  =  oldOffset;      nWrites[inputIndex]  =  nWrites[inputIndex]  +  1;   }                      Mapper.cl  
  • 12. TOPICS  IN  HADOOPCL   !  Auto-­‐scheduling  OpenCL  kernels  across  execuHon  plaQorms  through  machine  learning   ‒ HadoopCL  TaskTracker  is  responsible  for   1.  Assigning  each  Task  an  execuHon  plaQorm  (GPU,  CPU,  or  JVM)   2.  Recording  execuHon  Hme  for  each  task  along  with  the  kernel  executed  and  average  device   occupancy  during  that  task’s  execuHon   !  Device  assignment  is  based  on  programmer  hints  and/or  recorded  data  from  previous   runs   ‒  Data  is  recorded  in  files  to  be  used  across  Jobs   12   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 13. EVALUATION   !  Mahout  Kmeans   ‒ Mahout  provides  Hadoop  MapReduce   implementaHons  of  a  variety  of  ML  algorithms   ‒ KMeans  iteraHvely  searches  for  K  clusters   !  HadoopCL  KMeans  port   ‒ Mapper  is  trivial,  for  each  point  iterates  through   all  clusters  and  outputs  the  closest   ‒ Reducer  is  more  complex   ‒ Both  OpenCL  and  Java  versions  implemented,  as   HadoopCL  allows  the  programmer  to  force  JVM   execuHon   13   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 14. EVALUATION   !  Evaluated  on  a  10-­‐node  AMD  APU  cluster   !  Two  datasets  with  varying  parameters  tested   ‒ Wiki  data  set   ‒ ASF  e-­‐mail  archives  data  set   ‒ Varied  K,  the  number  of  clusters   ‒ Varied  the  type  of  pruning  done  on  the  input  data   (prune  all  but  the  N  most  frequent  tokens  vs.  prune   each  vector  to  be  at  most  length  M)   ‒ Varied  the  amount  of  pruning  done  (i.e.  the  values  of   N  and  M)   ‒ Enable  and  disable  HadoopCL  features  to  observe   impact  on  performance   14   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 15. EVALUATION   !  Graphs  here   15   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 16. CONCLUSION   !  HadoopCL  offers  the  flexibility,  reliability,  and   programmability  of  Hadoop  accelerated  by  naHve,   heterogeneous  OpenCL  threads   !  Using  HadoopCL  is  a  tradeoff:  lose  parts  of  the  Java   language  but  gain  improved  performance   !  EvaluaHon  of  KMeans  with  real-­‐world  data  sets  shows   that  HadoopCL  is  flexible  and  efficient  enough  to   improve  performance  of  real-­‐world  applicaHons   !  Future  work  to  target  HSA  instead  of  OpenCL       Max  Grossman,  jmg3@rice.edu   16   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 17. DISCLAIMER  &  ATTRIBUTION   The  informaHon  presented  in  this  document  is  for  informaHonal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informaHon  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  souware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obligaHon  to  update  or  otherwise  correct  or  revise  this  informaHon.  However,  AMD   reserves  the  right  to  revise  this  informaHon  and  to  make  changes  from  Hme  to  Hme  to  the  content  hereof  without  obligaHon  of  AMD  to  noHfy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combinaHons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdicHons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  EvaluaHon  CorporaHon  (SPEC).  Other   names  are  for  informaHonal  purposes  only  and  may  be  trademarks  of  their  respecHve  owners.   17   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL  
  • 18. SAMPLE  SHAPES   18   |      PRESENTATION  TITLE      |      NOVEMBER  21,  2013      |      CONFIDENTIAL