SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown,
Hassan Chafi, Michael Wu, Victoria Popic, Kunle Olukotun
                      Stanford University
             Pervasive Parallelism Laboratory (PPL)

  Tiark Rompf, Aleksandar Prokopec, Vojin Jovanovic,
            Philipp Haller, Martin Odersky
        Ecole Polytechnique Federale de Lausanne (EPFL)
            Programming Methods Laboratory (LAMP)
Squeryl
DSLs can be used for
high performance, too
Pthreads   Sun
OpenMP     T2




CUDA       Nvidia
OpenCL     Fermi




Verilog    Altera
VHDL       FPGA



MPI
PGAS         Cray
             Jaguar
Applications
                 Pthreads   Sun
   Scientific    OpenMP     T2
  Engineering


    Virtual      CUDA       Nvidia
    Worlds       OpenCL     Fermi



   Personal
   Robotics
                 Verilog    Altera
                 VHDL       FPGA
       Data
   Informatics
                 MPI
                 PGAS         Cray
                              Jaguar
Applications
                                                 Pthreads   Sun
   Scientific                                    OpenMP     T2
  Engineering


    Virtual                   DSLs               CUDA       Nvidia
    Worlds                                       OpenCL     Fermi



   Personal
   Robotics
                                                 Verilog    Altera
                                                 VHDL       FPGA
       Data
   Informatics
                                                 MPI
                                                 PGAS         Cray
                                                              Jaguar
                 Too many different programming models
n  Tiark    Rompf’s talk yesterday

n  In   case you missed it:
   n    Techniques for rewriting high-level
         programs to high-performance programs

   n    Build an intermediate representation (IR)
         of Scala programs at runtime

   n    IR can be optimized and code generated
n  Introduction     to existing Delite DSLs

n  Constructing      your own Delite DSL

n  Not    covered – under the covers:
   n    Implementation details about the Delite
         framework

   n    See http://cgo2012.hyperdsls.org/
n  Syntax   is legal Scala
                                  A       B       A       C
n  Staged
        to build an IR                *               *
 (metaprogramming)                            +



n  Optimized   at a high level

n  Compiled
           to different low-level target
 architectures
n  OptiML (Machine Learning)
n  OptiQL (Data querying)
n  OptiGraph (Large-scale graph analysis)
n  OptiCollections (Scala collections)
n  OptiMesh (Mesh-based PDE solvers)


Coming soon:

n  OptiSDR (Software-defined radio)
n  OptiCVX (Convex optimization)
OptiML: An Implicitly Parallel Domain-Specific Language for
                        Machine Learning, ICML 2011


n    Provides a familiar (MATLAB-like) language and
      API for writing ML applications
      n    Ex. val	
  c	
  =	
  a	
  *	
  b	
  (a, b are Matrix[Double])


n    Implicitly parallel data structures
      n    Base types: Vector[T], Matrix[T], Graph[V,E], Stream[T]
      n    Subtypes: TrainingSet, IndexVector, Image, …


n    Implicitly parallel control structures
      n    sum{…}, (0::end) {…}, gradient { … }, untilconverged { … }
      n    Arguments to control structures are anonymous functions with
            restricted semantics
untilconverged(mu,	
  tol){	
  mu	
  =>	
  
	
  	
  	
  	
  //	
  calculate	
  distances	
  to	
  current	
  centroids	
  




	
  	
  	
  	
  //	
  move	
  each	
  cluster	
  centroid	
  to	
  the	
  
	
  	
  	
  	
  //	
  mean	
  of	
  the	
  points	
  assigned	
  to	
  it	
  




}	
  
untilconverged(mu,	
  tol){	
  mu	
  =>	
  
	
  	
  	
  	
  //	
  calculate	
  distances	
  to	
  current	
  centroids	
  
	
  	
  	
  	
  val	
  c	
  =	
  (0::m){i	
  =>	
  
	
  	
  	
  	
  	
  	
  	
  val	
  allDistances	
  =	
  mu	
  mapRows	
  {	
  centroid	
  =>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  dist(x(i),	
  centroid)	
  
	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  	
  allDistances.minIndex	
  
	
  	
  	
  	
  }	
  

	
  	
  	
  	
  //	
  move	
  each	
  cluster	
  centroid	
  to	
  the	
  
	
  	
  	
  	
  //	
  mean	
  of	
  the	
  points	
  assigned	
  to	
  it	
  




}	
  
untilconverged(mu,	
  tol){	
  mu	
  =>	
  
	
  	
  	
  	
  //	
  calculate	
  distances	
  to	
  current	
  centroids	
  
	
  	
  	
  	
  val	
  c	
  =	
  (0::m){i	
  =>	
  
	
  	
  	
  	
  	
  	
  	
  val	
  allDistances	
  =	
  mu	
  mapRows	
  {	
  centroid	
  =>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  dist(x(i),	
  centroid)	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  }	
                                                                    fused
	
  	
  	
  	
  	
  	
  	
  allDistances.minIndex	
  
	
  	
  	
  	
  }	
  

	
  	
  	
  	
  //	
  move	
  each	
  cluster	
  centroid	
  to	
  the	
  
	
  	
  	
  	
  //	
  mean	
  of	
  the	
  points	
  assigned	
  to	
  it	
  
	
  	
  	
  	
  val	
  newMu	
  =	
  (0::k,*){	
  i	
  =>	
  
	
  	
  	
  	
  	
  	
  	
  val	
  (weightedpoints,	
  points)	
  =	
  sum(0,m)	
  {	
  j	
  =>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  if	
  (c(i)	
  ==	
  j)	
  (x(i),1)	
  
	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  	
  val	
  d	
  =	
  if	
  (points	
  ==	
  0)	
  1	
  else	
  points	
  	
  
	
  	
  	
  	
  	
  	
  	
  weightedpoints	
  /	
  d	
  
	
  	
  	
  	
  }	
  
	
  	
  	
  	
  newMu	
  
}	
  
n  Dataquerying of in-memory
  collections
  n    inspired by LINQ


n  SQL-like    declarative language

n  Use
      high-level semantic knowledge to
  implement query optimizer
//	
  lineItems:	
  Iterable[LineItem]	
  
//	
  Similar	
  to	
  Q1	
  of	
  the	
  TPCH	
  benchmark	
          hoisted
val	
  q	
  =	
  lineItems	
  Where(_.l_shipdate	
  <=	
  Date(‘‘19981201’’)).	
  
	
  	
  GroupBy(l	
  =>	
  (l.l_linestatus)).	
  
	
  	
  Select(g	
  =>	
  new	
  Result	
  {	
  
	
  	
  	
  	
  val	
  lineStatus	
  =	
  g.key	
  
	
  	
  	
  	
  val	
  sumQty	
  =	
  g.Sum(_.l_quantity)	
  
	
  	
  	
  	
  val	
  sumDiscountedPrice	
  =	
  
	
  	
  	
  	
  	
  	
  g.Sum(r	
  =>	
  r.l_extendedprice*(1.0-­‐r.l_discount))	
   fused
	
  	
  	
  	
  val	
  avgPrice	
  =	
  g.Average(_.l_extendedprice)	
  
	
  	
  	
  	
  val	
  countOrder	
  =	
  g.Count	
  
	
  	
  })	
  OrderBy(_.returnFlag)	
  ThenBy(_.lineStatus)	
  
n    A DSL for large-scale graph analysis based
      on Green-Marl
      Green-Marl: A DSL for Easy and Efficient Graph Analysis (Hong et. al.), ASPLOS ’12




n    Directed and undirected graphs, nodes,
      edges

n    Collections for node/edge storage
      n    Set, sequence, order

n    Deferred assignment and parallel reductions
      with bulk synchronous consistency
Implicitly parallel iteration


for(t	
  <-­‐	
  G.Nodes)	
  {	
  
	
  	
  val	
  rank	
  =	
  ((1.0	
  d)/	
  N)	
  +	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  d	
  *	
  Sum(t.InNbrs){w	
  =>	
  PR(w)	
  /	
  w.OutDegree}	
  
	
  	
  PR	
  <=	
  (t,rank)	
  
	
  	
  diff	
  +=	
  Math.abs(rank	
  -­‐	
  PR(t))	
  
}	
  


   Deferred assignment and scalar reduction

   Writes become visible after the loop completes
n    A port of a subset of Scala collections to a
      staged Delite DSL

n    Demonstrates the benefits of high-level
      optimization and code generation

         val	
  sourcedests	
  =	
  pagelinks	
  flatMap	
  {	
  l	
  =>	
  
         	
  	
  val	
  sd	
  =	
  l.split(":")	
  
         	
  	
  val	
  source	
  =	
  Long.parseLong(sd(0))	
                     Tuples
         	
  	
  val	
  dests	
  =	
  sd(1).trim.split("	
  ")	
                   encoded
         	
  	
  dests.map(d	
  =>	
  (Integer.parseInt(d),	
  source))	
          as longs
         }	
                                                                       in back-
         val	
  inverted	
  =	
  sourcedests	
  groupBy	
  (x	
  =>	
  x._1)	
     end
            Reverse web-link benchmark in OptiCollections
Program at a high level
Get high performance
Scala                                               CUDA
def	
  apply(x388:Int,x423:Int,x389:Int,	
          __device__	
  int	
  
	
  	
  x419:Array[Double],x431:Int,	
                   dev_collect_x478_x478(int	
  x423,int	
  
	
  	
  x433:Array[Double])	
  {	
                       x389,DeliteArray<double>	
  x419,int	
  
                                                         x431,DeliteArray<double>	
  x433,int	
  
val	
  x418	
  =	
  x413	
  *	
  x389	
                  x413)	
  {	
  
val	
  x912_zero	
  	
  	
  =	
  {	
  0	
  }	
      int	
  x418	
  =	
  x413	
  *	
  x389;	
  
val	
  x912_zero_2	
  =	
  {	
  	
                  int	
  x919	
  =	
  0;	
  	
  
	
  	
  1.7976931348623157E308	
  }	
               double	
  x919_2	
  =	
  1.7976931348623157E308;	
  
var	
  x912	
  	
  	
  =	
  x912_zero	
             int	
  x425	
  =	
  0;	
  
var	
  x912_2	
  =	
  x912_zero_2	
  
                                                    while	
  (x425	
  <	
  x423)	
  {	
  	
  	
  
var	
  x425	
  =	
  0	
                             	
  	
  int	
  x430	
  =	
  x425	
  *	
  1;	
  
while	
  (x425	
  <	
  x423)	
  {	
  	
  	
         	
  	
  int	
  x432	
  =	
  x430	
  *	
  x431;	
  
	
  	
  val	
  x430	
  =	
  x425	
  *	
  1	
        	
  	
  double	
  x923	
  =	
  0.0;	
  
	
  	
  val	
  x432	
  =	
  x430	
  *	
  x431	
     	
  	
  int	
  x450	
  =	
  0;	
  
	
  	
  val	
  x916_zero	
  =	
  {	
                .	
  .	
  .	
  
	
  	
  0.0	
  
	
  	
  }	
  
.	
  .	
  .	
  
1
                                           1.60                                                                  k-means




              Normalized Execution Time
                                           1.40                 Template    0.8
                                                                Matching                                               OptiML




                                                                                    1.6
                                           1.20




                                                                                               1.9
                                           1.00                             0.6
                                                                                                                       C++

OptiML
                                           0.80
                                                                            0.4




                                                                                                     3.6
                                           0.60




                                                                                                                 5.1
                                           0.40




                                                                                                                          10.6
                                                                            0.2
                                           0.20
                                           0.00                              0
                                                  1 CPU 2 CPU 4 CPU 8 CPU         1 CPU 2 CPU 4 CPU 8 CPU                    GPU

                                           2                                               0.63
                                                        0.52                1.6
                                                               TPCH-Q1                               TPCH-Q2
         Normalized Execution Time




                                          1.5                               1.2
                                                                                    1.0                    OptiQL
OptiQL                                     1
                                                  1.0
                                                                      1.2
                                                                            0.8
                                                                                                           LINQ


                                                                                                                 2.3
                                                                2.1
                                          0.5                               0.4
                                                                                                           6.7

                                           0                                  0
                                                    1P            8P                      1P                 8P
1                                            1
                                                               100k nodes x                                             8M nodes x
                                                               800k edges                                               64M edges
                                      0.8                                             0.8        1.3




                   Normalized Execution
                                                           1.7      1.7       1.7                                         OptiGraph
                                      0.6                                             0.6
                                                                                                            2.1           Green Marl
OptiGraph
                           Time
                                                                                                                  2.4
                                      0.4                                             0.4
                                                                                                                         3.93.8           4.3
                                                                                                                                    4.8
 (PageRank)                           0.2                                             0.2


                                          0                                            0
                                                    1P     2P       4P        8P                1P           2P           4P         8P
                                      4                                                1.8
                                                                              75 MB                  0.61                             463 MB
                              3.5              0.30                                    1.6

                                      3
                                                                                       1.4                                  OptiCollections
                                                                                       1.2
                              2.5                                                               1.0

OptiCollections                       2
                                                          0.52

                                                                    0.71
                                                                                       0.8
                                                                                            1                      1.2
                                                                                                                            Scala Parallel
                                                                                                                            Collections
                              1.5                                             0.82     0.6
 (Reverse web-                        1
                                              1.0
                                                         1.3
                                                                                                              2.2              2.1
                                                                                       0.4                                 3.8                  3.4
                                                                  2.0
 link benchmark)              0.5                                          3.1
                                                                                       0.2
                                                                                                                                          5.6

                                      0                                                     0
                                               1P         2P        4P        8P                 1P               2P           4P           8P
How do I build my own Delite DSL?
Domain            Data           Physics              Machine            Graph
 Specific        Analytics                             Learning          Analysis
                 (OptiQL)        (OptiMesh)            (OptiML)        (OptiGraph)
Languages

                             Domain Embedding Language (Scala)
                                           Modular Staging


                                       Delite Compiler

 Delite: DSL                               Parallel Patterns
Infrastructure      Static Optimizations          Heterogeneous Code Generation



                                       Delite Runtime

                   Walk-time Optimizations            Locality-aware Scheduling




Heterogeneous
                                   SMP                         GPU
  Hardware
1.      Types
      n    abstract, front-end

2.      Operations
      n    language operators and methods available on types;
            represented by IR nodes

3.      Data Structures
      n    platform-specific concrete implementation, back-end

4.      Code Generators
      n    Scala traits that define how to emit code as strings for
            various IR nodes and platforms

5.      Analyses and Optimizations (Optional)
      n    IR rewriting via pattern matching, traversals/transformations
            (e.g. fusion)
abstract	
  class	
  Vector[T]	
  extends	
  DeliteCollection[T]	
  

abstract	
  class	
  Matrix[T]	
  extends	
  DeliteCollection[T]	
  

abstract	
  class	
  Image[T]	
  extends	
  Matrix[T]	
  



placeholders for static type
checking and method dispatch;

not bound to any implementation
The same abstract
trait	
  VectorOps	
  {	
                                             Vector we defined earlier
	
  	
  //	
  add	
  an	
  infix	
  +	
  operator	
  to	
  Rep[Vector[A]]	
  
	
  	
  def	
  infix_+(lhs:	
  Rep[Vector[A]],	
  rhs:	
  Rep[Vector[A]])	
  =	
  
	
  	
  	
  	
  vector_plus(lhs,	
  rhs)	
  

	
  	
  //	
  abstract,	
  applications	
  cannot	
  inspect	
  what	
  happens	
  	
  
	
  	
  //	
  when	
  methods	
  are	
  called	
  
	
  	
  def	
  vector_length(lhs:	
  Rep[Vector[A]]):	
  Rep[Int]	
  
	
  	
  def	
  vector_plus(lhs:	
  Rep[Vector[A]],	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  rhs:	
  Rep[Vector[A]]):	
  Rep[Vector[A]]	
  
}	
  
trait	
  VectorOpsExp	
  extends	
  VectorOps	
  with	
  Expressions	
  {	
  
//	
  a	
  Delite	
  parallel	
  op	
  IR	
  node	
  
case	
  class	
  VectorPlus(inA:	
  Exp[Vector[A]],	
  inB:	
  Exp[Vector[A]])     	
  
        extends	
  DeliteOpZipWith[Vector[A],	
  Vector[A],	
  Vector[A]]	
  {	
  
    	
  //	
  number	
  of	
  elements	
  in	
  the	
  input	
  collections	
  
    	
  def	
  size	
  =	
  inA.length	
  
    	
  //	
  the	
  output	
  collection	
  
    	
  def	
  alloc	
  =	
  Vector[A](inA.length)	
  
    	
  //	
  the	
  ZipWith	
  function	
  
    	
  def	
  func	
  =	
  (a,b)	
  =>	
  a	
  +	
  b	
  
}	
  
//	
  construct	
  IR	
  nodes	
  
def	
  vector_plus(lhs:	
  Exp[Vector[A]],	
  rhs:	
  Exp[Vector[A]])	
  
    	
  =	
  VectorPlus(lhs,	
  rhs)	
  
}	
  
//	
  a	
  concrete,	
  back-­‐end	
  Scala	
  data	
  structure	
  
//	
  will	
  be	
  instantiated	
  by	
  generated	
  code	
  
class	
  Vector[T](__length:	
  Int)	
  {	
  
   	
  var	
  _length	
  =	
  __length	
  
   	
  var	
  _data:	
  Array[T]	
  =	
  new	
  Array[T](_length)	
  
}	
  

//	
  corresponding	
  data	
  structures	
  for	
  other	
  back-­‐ends	
  
//	
  (CUDA,	
  OpenCL,	
  etc.)	
  
//	
  .	
  .	
  .	
  
trait	
  ScalaGenVectorOps	
  extends	
  ScalaGen	
  {	
  
  	
  val	
  IR:	
  VectorOpsExp	
  
  	
  import	
  IR._	
  

   	
  override	
  def	
  emitNode(sym:	
  Sym[Any],	
  rhs:	
  Def[Any])	
  
   	
  (implicit	
  stream:	
  PrintWriter)	
  =	
  

        	
  	
  	
  //	
  generate	
  code	
  for	
  particular	
  IR	
  nodes	
  
        	
  	
  	
  rhs	
  match	
  {	
  
                                                                                                                     The exact
        	
  	
  	
  	
  	
  case	
  v@VectorNew(length)	
  =>	
  
                                                                                                                     back-end field
	
  	
  	
  	
  	
  	
  	
  	
  	
  emitValDef(sym,	
  “new	
  "	
  +	
  remap("Vector")+"("	
  +	
                           	
  	
  
                           	
             	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  quote(length)	
  +	
  ")")	
   name we
        	
  	
  	
  	
  	
  case	
  VectorLength(x)	
  =>	
  	
                                                      defined earlier
        	
  	
  	
  	
  	
  	
  	
  emitValDef(sym,	
  quote(x)	
  +	
  ".	
  _length")	
  
        	
  	
  	
  	
  	
  case	
  _	
  =>	
  super.emitNode(sym,	
  rhs)	
  
	
  	
  	
  	
  }	
  
}	
  
override	
  def	
  matrix_plus[A:Manifest:Arith]	
  
  	
  (x:	
  Exp[Matrix[A]],	
  y:	
  Exp[Matrix[A]])	
  =	
  

      	
  	
  	
  (x,	
  y)	
  match	
  {	
  
	
  	
  	
  	
  	
  	
  	
  //	
  (AB	
  +	
  AD)	
  ==	
  A(B	
  +	
  D)	
  
	
  	
  	
  	
  	
  	
  	
  case	
  (Def(MatrixTimes(a,	
  b)),	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Def(MatrixTimes(c,	
  d)))	
  if	
  (a	
  ==	
  c)	
  =>	
  
	
  	
  
      	
                	
  	
  	
  	
  	
  	
  	
  	
  //	
  return	
  optimized	
  version	
  
      	
                	
  	
  	
  	
  	
  	
  	
  	
  matrix_times(a,	
  matrix_plus(b,d))	
  

   	
       	
  //	
  other	
  rewrites	
  
   	
       	
  //	
  case	
  .	
  .	
  .	
  

   	
  	
  	
  	
  	
  case	
  _	
  =>	
  super.matrix_plus(x,	
  y)	
  
   	
  	
  	
  }	
  
trait	
  OptiML	
  extends	
  OptiMLScalaOpsPkg	
  with	
  VectorOps	
  with	
  
  MatrixOps	
  	
  with	
  ...	
  

trait	
  OptiMLExp	
  extends	
  OptiMLScalaOpsPkgExp	
  with	
  
  VectorOpsExp	
  with	
  MatrixOpsExp	
  	
  with	
  ...	
  

trait	
  OptiMLCodeGenScala	
  extends	
  OptiMLScalaCodeGenPkg	
  with   	
  
  ScalaGenVectorOps	
  with	
  ScalaGenMatrixOps	
  	
  with	
  ...	
  

trait	
  OptiMLCodeGenCuda	
  extends	
  OptiMLCudaCodeGenPkg	
  with	
  
  CudaGenVectorOps	
  with	
  CudaGenMatrixOps	
  	
  with	
  ...	
  
n    Delite DSLs target high performance
      architectures from Scala

n    Open source – use them to accelerate
      your apps or build your own!
      n    http://github.com/stanford-ppl/Delite


n    Mailing List:
      n    http://groups.google.com/group/delite-devel

n    Thank you

Contenu connexe

Tendances

Reactive Qt - Ivan Čukić (Qt World Summit 2015)
Reactive Qt - Ivan Čukić (Qt World Summit 2015)Reactive Qt - Ivan Čukić (Qt World Summit 2015)
Reactive Qt - Ivan Čukić (Qt World Summit 2015)Ivan Čukić
 
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaM Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaJan Aerts
 
Security Attacks on RSA
Security Attacks on RSASecurity Attacks on RSA
Security Attacks on RSAPratik Poddar
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesSubhajit Sahu
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosAMD Developer Central
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
Copy Your Favourite Nokia App with Qt
Copy Your Favourite Nokia App with QtCopy Your Favourite Nokia App with Qt
Copy Your Favourite Nokia App with Qtaccount inactive
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...ICAC09
 
Special Effects with Qt Graphics View
Special Effects with Qt Graphics ViewSpecial Effects with Qt Graphics View
Special Effects with Qt Graphics Viewaccount inactive
 
CS 354 Programmable Shading
CS 354 Programmable ShadingCS 354 Programmable Shading
CS 354 Programmable ShadingMark Kilgard
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesNarann29
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET Journal
 

Tendances (20)

Reactive Qt - Ivan Čukić (Qt World Summit 2015)
Reactive Qt - Ivan Čukić (Qt World Summit 2015)Reactive Qt - Ivan Čukić (Qt World Summit 2015)
Reactive Qt - Ivan Čukić (Qt World Summit 2015)
 
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaM Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
 
The Future of Qt Widgets
The Future of Qt WidgetsThe Future of Qt Widgets
The Future of Qt Widgets
 
Security Attacks on RSA
Security Attacks on RSASecurity Attacks on RSA
Security Attacks on RSA
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Copy Your Favourite Nokia App with Qt
Copy Your Favourite Nokia App with QtCopy Your Favourite Nokia App with Qt
Copy Your Favourite Nokia App with Qt
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
02 - Basics of Qt
02 - Basics of Qt02 - Basics of Qt
02 - Basics of Qt
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
 
Special Effects with Qt Graphics View
Special Effects with Qt Graphics ViewSpecial Effects with Qt Graphics View
Special Effects with Qt Graphics View
 
CS 354 Programmable Shading
CS 354 Programmable ShadingCS 354 Programmable Shading
CS 354 Programmable Shading
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering Techniques
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
 

En vedette (11)

Prediction suretogowrong
Prediction suretogowrongPrediction suretogowrong
Prediction suretogowrong
 
Nps
NpsNps
Nps
 
Man made marvels
Man made marvelsMan made marvels
Man made marvels
 
Project kepler compile time metaprogramming for scala
Project kepler compile time metaprogramming for scalaProject kepler compile time metaprogramming for scala
Project kepler compile time metaprogramming for scala
 
Test driven infrastructure
Test driven infrastructureTest driven infrastructure
Test driven infrastructure
 
Scala days mizushima
Scala days mizushimaScala days mizushima
Scala days mizushima
 
Cnc scala-presentation
Cnc scala-presentationCnc scala-presentation
Cnc scala-presentation
 
Proposal parade seni
Proposal parade seniProposal parade seni
Proposal parade seni
 
Frase dan klausa
Frase dan klausaFrase dan klausa
Frase dan klausa
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
EDS selection & implementation @ CCC
EDS selection & implementation @ CCCEDS selection & implementation @ CCC
EDS selection & implementation @ CCC
 

Similaire à Arvindsujeeth scaladays12

The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...David Walker
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortStefan Marr
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
 
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...Sourour Kanzari
 
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...Sourour Kanzari
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Big Data for Mobile
Big Data for MobileBig Data for Mobile
Big Data for MobileBugSense
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark Summit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
Overview of Chainer and Its Features
Overview of Chainer and Its FeaturesOverview of Chainer and Its Features
Overview of Chainer and Its FeaturesSeiya Tokui
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
Introduction to-vhdl
Introduction to-vhdlIntroduction to-vhdl
Introduction to-vhdlNeeraj Gupta
 

Similaire à Arvindsujeeth scaladays12 (20)

The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
 
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
Andrade sep15 fromlowarchitecturalexpertiseuptohighthroughputnonbinaryldpcdec...
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Apache spark core
Apache spark coreApache spark core
Apache spark core
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Big Data for Mobile
Big Data for MobileBig Data for Mobile
Big Data for Mobile
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Overview of Chainer and Its Features
Overview of Chainer and Its FeaturesOverview of Chainer and Its Features
Overview of Chainer and Its Features
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Introduction to-vhdl
Introduction to-vhdlIntroduction to-vhdl
Introduction to-vhdl
 

Plus de Skills Matter Talks

Jordan west real workscalazfinal2
Jordan west   real workscalazfinal2Jordan west   real workscalazfinal2
Jordan west real workscalazfinal2Skills Matter Talks
 
(Oleg zhurakousky)spring integration-scala-intro
(Oleg zhurakousky)spring integration-scala-intro(Oleg zhurakousky)spring integration-scala-intro
(Oleg zhurakousky)spring integration-scala-introSkills Matter Talks
 
SCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in Anger
SCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in AngerSCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in Anger
SCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in AngerSkills Matter Talks
 
CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...
CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...
CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...Skills Matter Talks
 
Martin sustrik future_of_messaging
Martin sustrik future_of_messagingMartin sustrik future_of_messaging
Martin sustrik future_of_messagingSkills Matter Talks
 

Plus de Skills Matter Talks (9)

Couch db skillsmatter-prognosql
Couch db skillsmatter-prognosqlCouch db skillsmatter-prognosql
Couch db skillsmatter-prognosql
 
Jordan west real workscalazfinal2
Jordan west   real workscalazfinal2Jordan west   real workscalazfinal2
Jordan west real workscalazfinal2
 
(Oleg zhurakousky)spring integration-scala-intro
(Oleg zhurakousky)spring integration-scala-intro(Oleg zhurakousky)spring integration-scala-intro
(Oleg zhurakousky)spring integration-scala-intro
 
SCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in Anger
SCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in AngerSCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in Anger
SCALA DAYS 2012: Ben Parker on Interactivity - Anti-XML in Anger
 
Real World Scalaz
Real World ScalazReal World Scalaz
Real World Scalaz
 
CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...
CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...
CukeUp! 2012: Michael Nacos on Just enough infrastructure for product develop...
 
Tmt predictions 2011
Tmt predictions 2011Tmt predictions 2011
Tmt predictions 2011
 
Martin sustrik future_of_messaging
Martin sustrik future_of_messagingMartin sustrik future_of_messaging
Martin sustrik future_of_messaging
 
Marek pubsubhuddle realtime_web
Marek pubsubhuddle realtime_webMarek pubsubhuddle realtime_web
Marek pubsubhuddle realtime_web
 

Dernier

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Dernier (20)

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Arvindsujeeth scaladays12

  • 1. Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Michael Wu, Victoria Popic, Kunle Olukotun Stanford University Pervasive Parallelism Laboratory (PPL) Tiark Rompf, Aleksandar Prokopec, Vojin Jovanovic, Philipp Haller, Martin Odersky Ecole Polytechnique Federale de Lausanne (EPFL) Programming Methods Laboratory (LAMP)
  • 3. DSLs can be used for high performance, too
  • 4. Pthreads Sun OpenMP T2 CUDA Nvidia OpenCL Fermi Verilog Altera VHDL FPGA MPI PGAS Cray Jaguar
  • 5. Applications Pthreads Sun Scientific OpenMP T2 Engineering Virtual CUDA Nvidia Worlds OpenCL Fermi Personal Robotics Verilog Altera VHDL FPGA Data Informatics MPI PGAS Cray Jaguar
  • 6. Applications Pthreads Sun Scientific OpenMP T2 Engineering Virtual DSLs CUDA Nvidia Worlds OpenCL Fermi Personal Robotics Verilog Altera VHDL FPGA Data Informatics MPI PGAS Cray Jaguar Too many different programming models
  • 7. n  Tiark Rompf’s talk yesterday n  In case you missed it: n  Techniques for rewriting high-level programs to high-performance programs n  Build an intermediate representation (IR) of Scala programs at runtime n  IR can be optimized and code generated
  • 8. n  Introduction to existing Delite DSLs n  Constructing your own Delite DSL n  Not covered – under the covers: n  Implementation details about the Delite framework n  See http://cgo2012.hyperdsls.org/
  • 9. n  Syntax is legal Scala A B A C n  Staged to build an IR * * (metaprogramming) + n  Optimized at a high level n  Compiled to different low-level target architectures
  • 10. n  OptiML (Machine Learning) n  OptiQL (Data querying) n  OptiGraph (Large-scale graph analysis) n  OptiCollections (Scala collections) n  OptiMesh (Mesh-based PDE solvers) Coming soon: n  OptiSDR (Software-defined radio) n  OptiCVX (Convex optimization)
  • 11. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning, ICML 2011 n  Provides a familiar (MATLAB-like) language and API for writing ML applications n  Ex. val  c  =  a  *  b  (a, b are Matrix[Double]) n  Implicitly parallel data structures n  Base types: Vector[T], Matrix[T], Graph[V,E], Stream[T] n  Subtypes: TrainingSet, IndexVector, Image, … n  Implicitly parallel control structures n  sum{…}, (0::end) {…}, gradient { … }, untilconverged { … } n  Arguments to control structures are anonymous functions with restricted semantics
  • 12. untilconverged(mu,  tol){  mu  =>          //  calculate  distances  to  current  centroids          //  move  each  cluster  centroid  to  the          //  mean  of  the  points  assigned  to  it   }  
  • 13. untilconverged(mu,  tol){  mu  =>          //  calculate  distances  to  current  centroids          val  c  =  (0::m){i  =>                val  allDistances  =  mu  mapRows  {  centroid  =>                      dist(x(i),  centroid)                }                allDistances.minIndex          }          //  move  each  cluster  centroid  to  the          //  mean  of  the  points  assigned  to  it   }  
  • 14. untilconverged(mu,  tol){  mu  =>          //  calculate  distances  to  current  centroids          val  c  =  (0::m){i  =>                val  allDistances  =  mu  mapRows  {  centroid  =>                      dist(x(i),  centroid)                              }   fused              allDistances.minIndex          }          //  move  each  cluster  centroid  to  the          //  mean  of  the  points  assigned  to  it          val  newMu  =  (0::k,*){  i  =>                val  (weightedpoints,  points)  =  sum(0,m)  {  j  =>                      if  (c(i)  ==  j)  (x(i),1)                }                val  d  =  if  (points  ==  0)  1  else  points                  weightedpoints  /  d          }          newMu   }  
  • 15. n  Dataquerying of in-memory collections n  inspired by LINQ n  SQL-like declarative language n  Use high-level semantic knowledge to implement query optimizer
  • 16. //  lineItems:  Iterable[LineItem]   //  Similar  to  Q1  of  the  TPCH  benchmark   hoisted val  q  =  lineItems  Where(_.l_shipdate  <=  Date(‘‘19981201’’)).      GroupBy(l  =>  (l.l_linestatus)).      Select(g  =>  new  Result  {          val  lineStatus  =  g.key          val  sumQty  =  g.Sum(_.l_quantity)          val  sumDiscountedPrice  =              g.Sum(r  =>  r.l_extendedprice*(1.0-­‐r.l_discount))   fused        val  avgPrice  =  g.Average(_.l_extendedprice)          val  countOrder  =  g.Count      })  OrderBy(_.returnFlag)  ThenBy(_.lineStatus)  
  • 17. n  A DSL for large-scale graph analysis based on Green-Marl Green-Marl: A DSL for Easy and Efficient Graph Analysis (Hong et. al.), ASPLOS ’12 n  Directed and undirected graphs, nodes, edges n  Collections for node/edge storage n  Set, sequence, order n  Deferred assignment and parallel reductions with bulk synchronous consistency
  • 18. Implicitly parallel iteration for(t  <-­‐  G.Nodes)  {      val  rank  =  ((1.0  d)/  N)  +                              d  *  Sum(t.InNbrs){w  =>  PR(w)  /  w.OutDegree}      PR  <=  (t,rank)      diff  +=  Math.abs(rank  -­‐  PR(t))   }   Deferred assignment and scalar reduction Writes become visible after the loop completes
  • 19. n  A port of a subset of Scala collections to a staged Delite DSL n  Demonstrates the benefits of high-level optimization and code generation val  sourcedests  =  pagelinks  flatMap  {  l  =>      val  sd  =  l.split(":")      val  source  =  Long.parseLong(sd(0))   Tuples    val  dests  =  sd(1).trim.split("  ")   encoded    dests.map(d  =>  (Integer.parseInt(d),  source))   as longs }   in back- val  inverted  =  sourcedests  groupBy  (x  =>  x._1)   end Reverse web-link benchmark in OptiCollections
  • 20. Program at a high level Get high performance
  • 21. Scala CUDA def  apply(x388:Int,x423:Int,x389:Int,   __device__  int      x419:Array[Double],x431:Int,   dev_collect_x478_x478(int  x423,int      x433:Array[Double])  {   x389,DeliteArray<double>  x419,int   x431,DeliteArray<double>  x433,int   val  x418  =  x413  *  x389   x413)  {   val  x912_zero      =  {  0  }   int  x418  =  x413  *  x389;   val  x912_zero_2  =  {     int  x919  =  0;        1.7976931348623157E308  }   double  x919_2  =  1.7976931348623157E308;   var  x912      =  x912_zero   int  x425  =  0;   var  x912_2  =  x912_zero_2   while  (x425  <  x423)  {       var  x425  =  0      int  x430  =  x425  *  1;   while  (x425  <  x423)  {          int  x432  =  x430  *  x431;      val  x430  =  x425  *  1      double  x923  =  0.0;      val  x432  =  x430  *  x431      int  x450  =  0;      val  x916_zero  =  {   .  .  .      0.0      }   .  .  .  
  • 22. 1 1.60 k-means Normalized Execution Time 1.40 Template 0.8 Matching OptiML 1.6 1.20 1.9 1.00 0.6 C++ OptiML 0.80 0.4 3.6 0.60 5.1 0.40 10.6 0.2 0.20 0.00 0 1 CPU 2 CPU 4 CPU 8 CPU 1 CPU 2 CPU 4 CPU 8 CPU GPU 2 0.63 0.52 1.6 TPCH-Q1 TPCH-Q2 Normalized Execution Time 1.5 1.2 1.0 OptiQL OptiQL 1 1.0 1.2 0.8 LINQ 2.3 2.1 0.5 0.4 6.7 0 0 1P 8P 1P 8P
  • 23. 1 1 100k nodes x 8M nodes x 800k edges 64M edges 0.8 0.8 1.3 Normalized Execution 1.7 1.7 1.7 OptiGraph 0.6 0.6 2.1 Green Marl OptiGraph Time 2.4 0.4 0.4 3.93.8 4.3 4.8 (PageRank) 0.2 0.2 0 0 1P 2P 4P 8P 1P 2P 4P 8P 4 1.8 75 MB 0.61 463 MB 3.5 0.30 1.6 3 1.4 OptiCollections 1.2 2.5 1.0 OptiCollections 2 0.52 0.71 0.8 1 1.2 Scala Parallel Collections 1.5 0.82 0.6 (Reverse web- 1 1.0 1.3 2.2 2.1 0.4 3.8 3.4 2.0 link benchmark) 0.5 3.1 0.2 5.6 0 0 1P 2P 4P 8P 1P 2P 4P 8P
  • 24. How do I build my own Delite DSL?
  • 25. Domain Data Physics Machine Graph Specific Analytics Learning Analysis (OptiQL) (OptiMesh) (OptiML) (OptiGraph) Languages Domain Embedding Language (Scala) Modular Staging Delite Compiler Delite: DSL Parallel Patterns Infrastructure Static Optimizations Heterogeneous Code Generation Delite Runtime Walk-time Optimizations Locality-aware Scheduling Heterogeneous SMP GPU Hardware
  • 26. 1.  Types n  abstract, front-end 2.  Operations n  language operators and methods available on types; represented by IR nodes 3.  Data Structures n  platform-specific concrete implementation, back-end 4.  Code Generators n  Scala traits that define how to emit code as strings for various IR nodes and platforms 5.  Analyses and Optimizations (Optional) n  IR rewriting via pattern matching, traversals/transformations (e.g. fusion)
  • 27. abstract  class  Vector[T]  extends  DeliteCollection[T]   abstract  class  Matrix[T]  extends  DeliteCollection[T]   abstract  class  Image[T]  extends  Matrix[T]   placeholders for static type checking and method dispatch; not bound to any implementation
  • 28. The same abstract trait  VectorOps  {   Vector we defined earlier    //  add  an  infix  +  operator  to  Rep[Vector[A]]      def  infix_+(lhs:  Rep[Vector[A]],  rhs:  Rep[Vector[A]])  =          vector_plus(lhs,  rhs)      //  abstract,  applications  cannot  inspect  what  happens        //  when  methods  are  called      def  vector_length(lhs:  Rep[Vector[A]]):  Rep[Int]      def  vector_plus(lhs:  Rep[Vector[A]],                                      rhs:  Rep[Vector[A]]):  Rep[Vector[A]]   }  
  • 29. trait  VectorOpsExp  extends  VectorOps  with  Expressions  {   //  a  Delite  parallel  op  IR  node   case  class  VectorPlus(inA:  Exp[Vector[A]],  inB:  Exp[Vector[A]])   extends  DeliteOpZipWith[Vector[A],  Vector[A],  Vector[A]]  {    //  number  of  elements  in  the  input  collections    def  size  =  inA.length    //  the  output  collection    def  alloc  =  Vector[A](inA.length)    //  the  ZipWith  function    def  func  =  (a,b)  =>  a  +  b   }   //  construct  IR  nodes   def  vector_plus(lhs:  Exp[Vector[A]],  rhs:  Exp[Vector[A]])    =  VectorPlus(lhs,  rhs)   }  
  • 30. //  a  concrete,  back-­‐end  Scala  data  structure   //  will  be  instantiated  by  generated  code   class  Vector[T](__length:  Int)  {    var  _length  =  __length    var  _data:  Array[T]  =  new  Array[T](_length)   }   //  corresponding  data  structures  for  other  back-­‐ends   //  (CUDA,  OpenCL,  etc.)   //  .  .  .  
  • 31. trait  ScalaGenVectorOps  extends  ScalaGen  {    val  IR:  VectorOpsExp    import  IR._    override  def  emitNode(sym:  Sym[Any],  rhs:  Def[Any])    (implicit  stream:  PrintWriter)  =        //  generate  code  for  particular  IR  nodes        rhs  match  {   The exact          case  v@VectorNew(length)  =>   back-end field                  emitValDef(sym,  “new  "  +  remap("Vector")+"("  +                              quote(length)  +  ")")   name we          case  VectorLength(x)  =>     defined earlier              emitValDef(sym,  quote(x)  +  ".  _length")            case  _  =>  super.emitNode(sym,  rhs)          }   }  
  • 32. override  def  matrix_plus[A:Manifest:Arith]    (x:  Exp[Matrix[A]],  y:  Exp[Matrix[A]])  =        (x,  y)  match  {                //  (AB  +  AD)  ==  A(B  +  D)                case  (Def(MatrixTimes(a,  b)),                            Def(MatrixTimes(c,  d)))  if  (a  ==  c)  =>                        //  return  optimized  version                    matrix_times(a,  matrix_plus(b,d))      //  other  rewrites      //  case  .  .  .            case  _  =>  super.matrix_plus(x,  y)        }  
  • 33. trait  OptiML  extends  OptiMLScalaOpsPkg  with  VectorOps  with   MatrixOps    with  ...   trait  OptiMLExp  extends  OptiMLScalaOpsPkgExp  with   VectorOpsExp  with  MatrixOpsExp    with  ...   trait  OptiMLCodeGenScala  extends  OptiMLScalaCodeGenPkg  with   ScalaGenVectorOps  with  ScalaGenMatrixOps    with  ...   trait  OptiMLCodeGenCuda  extends  OptiMLCudaCodeGenPkg  with   CudaGenVectorOps  with  CudaGenMatrixOps    with  ...  
  • 34. n  Delite DSLs target high performance architectures from Scala n  Open source – use them to accelerate your apps or build your own! n  http://github.com/stanford-ppl/Delite n  Mailing List: n  http://groups.google.com/group/delite-devel n  Thank you