SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Click to edit Master title style

   This subtitle is 20 points
   Bullets are blue
   They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this



                                                               1
Click to edit Master title style

   This subtitle is 20 points
   Bullets are blue
      Graphics Programming on the Web
    They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
                           with WebCL
    there is insufficient line spacing.Motorola the maximum
                  Mikaël Bourges-Sévenier, This is Mobility
    recommended number of lines 2012 slide (seven).
                               August 9, per
     Sub bullets look like this



                                                               2
Click to edit Master title style


   This subtitle is planks ;-) Blender/Bullet/SmallLuxGPU
    Over 32000 20 points
   Bullets are blue
    OpenCL
   They have 110% line spacing, 2 points before & after
      By Alain Ducharme “Phymec”
   Longer bullets in the form of a paragraph are harder to read if
       http://www.youtube.com/watch?v=143k1fqPukk
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this



                                                               3
Motivation
Click to edit Master title style


   For compute intensive web applications
    This subtitle is 20 points
   Bullets are blue
      Games: physics, special effects
   They have 110% linephotography
      Computational spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
      Scientific simulations
    there is insufficient line spacing. This is the maximum
      Augmented reality
    recommended number of lines per slide (seven).
     … bullets look like this
     Sub
 Use many devices for general computations
     CPU, GPU, DSP, FPGA…
                                                               4
Motivation
Click to edit Master title style


   This subtitle is 20 exponential GFLOPS growth every
    GPUs provide points            Chapter 1. Introduction

   Bullets areCPUs
    year vs. blue
   They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this


                                                         NVidia CUDA/OpenCL C programming guide
                                                                                                  5
Content edit Master title style
Click to

   Motivation and 20 points
    This subtitle is Goals

   General-Purpose computations on GPU (GPGPU)
    Bullets are blue
     From             to
 They have 110% line spacing, 2 points before & after
    The need for more general data-parallel computations
 Longer overview the form of a paragraph are harder to read if
 WebCL bullets in
  there is insufficient line spacing. This is the maximum
    A JavaScript API over OpenCL
  recommended number of lines per slide (seven).
    OpenCL concepts
     WebCL API look like this
     Sub bullets
 WebCL programming
     Pure computations
     WebGL interoperability
                                                           6
Content edit Master title style
Click to

   Motivation and 20 points
    This subtitle is Goals

   General-Purpose computations on GPU (GPGPU)
    Bullets are blue
     From             to
 They have 110% line spacing, 2 points before & after
    The need for more general data-parallel computations
 Longer overview the form of a paragraph are harder to read if
 WebCL bullets in
  there is insufficient line spacing. This is the maximum
    A JavaScript API over OpenCL
  recommended number of lines per slide (seven).
    OpenCL concepts
     WebCL API look like this
     Sub bullets
 WebCL programming
     Pure computations
     WebGL interoperability
                                                           7
WebGL edit Master title style
     Click to pipeline

           Programmable vertex &
            This subtitle is 20 points fragment shaders
           Bullets are blue       Application
                                                                            GPU                                    Frame Buffer



           They have 110% line spacing, 2 points before & after
       vertex



      
fragment    Longer bullets in the form of a paragraph are harder to read if
                                    vertices
                                     (3D)
                                                              vertices
                                                              (screen)
                                                                                      fragments                      pixels


                                                   Vertex                                           Fragment
            there is insufficient line spacing. This is the maximum
                                                 processing
                                                                         Rasterizer
                                                                                                   processing



            recommended number of lines per slide (seven).
                 Sub bullets look like this      Vertex
                                                  Shader
                                                                                        Textures                Samplers




                                                                                                   Fragment
                                                                                                    Shader




                                                                                                                    8
General Purpose computations
Click to edit Master title style on GPU

 With clever 20 points
 This subtitle ismapping of algorithms to GL pipeline
     Textures as data buffers
    Bullets are blue
     Texture coordinates as computational domain
    They have 110% line spacing, 2 points before & after
      Vertex coordinates as computational range
   Longer bullets in the form of a paragraph are harder to read if
      Vertex shaders                      Scatter (write values)
    there is insufficient line spacing. This is the maximum
        • to start computations
    recommended number of lines per slide (seven).
        • scatter operations
      Sub bullets look like this
       Fragment shaders
                                        Gather (read values)
        • for algorithms steps
        • gather operations

                                                               9
GPGPU with GL limitations
Click to edit Master title style


   This subtitle is 20 points
    Hard to map algorithms to graphics pipeline
   Bullets are blue
   Hard to do scatter operations
   They have 110% line spacing, 2 points before & after


    Shader instancesform of a paragraph are harder to read if
    Longer bullets in the
                            can NOT directly communicate
    with is insufficient line spacing. This is the maximum
    there one another
    recommendedGPGPU of linesGL is hack-ish
                 … number with per slide (seven).
     Sub bullets look like this

 CL is made for GPGPU, not graphics
                                                         10
Content edit Master title style
Click to

   Motivation and 20 points
    This subtitle is Goals

   General-Purpose computations on GPU (GPGPU)
    Bullets are blue
     From             to
 They have 110% line spacing, 2 points before & after
    The need for more general data-parallel computations
 Longer overview the form of a paragraph are harder to read if
 WebCL bullets in
  there is insufficient line spacing. This is the maximum
    A JavaScript API over OpenCL
  recommended number of lines per slide (seven).
    OpenCL concepts
     WebCL API look like this
     Sub bullets
 WebCL programming
     Pure computations
     WebGL interoperability
                                                           11
WebCL edit Master
Click to overview title style

   WebCL brings parallel computing to
    This subtitle is 20 points
   the Web through a secure
    Bullets are blue
    JavaScript binding to OpenCL 1.1
   They have 110% line spacing, 2 points before & after
    (2011)
   Longer bullets inroyalty-freeof a paragraph are harder to read if
      Open standard, the form
    there is insufficient line spacing. This is the maximum
      Platform independent
    recommended number of lines per slide (seven).
      Device independent
     being standardized by Khronos
     Sub bullets look like this
 First public working draft April 2012
     http://www.khronos.org/webcl/

                                                                 12
OpenCL overview
Click to edit Master title style

 Features
 This subtitle is 20 points
     C-based cross-platform API
   Bullets are blue
      Kernels use a subset of C99 and extensions
   They have 110% line spacing, 2 points before & after
        • Vector extensions (<type>N)
        • No recursion, no function pointers
   Longer bullets memory (malloc,of a paragraph libc methods (memcpy…) if
        • No dynamic in the form free…), no standard are harder to read
    there is insufficient lineaccuracy both for intergers and floats
      Well-defined numerical spacing. This is the maximum
      Rich-set of built-in functions (e.g. as GLSL and more)
    recommended number of lines per slide (seven).
        • But no random method
     Sub bullets look like this
     Close to the hardware
        • Control over memory use
        • Control over thread scheduling

                                                                     13
OpenCL Device Model
Click to edit Master title style

 This subtitle is 20 points or more Compute devices
  A host is connected to one
 Compute device
 Bullets are blue                                                   ...
                                                                        ...

    A
                                                                           ...

 Theycollection of oneline spacing, 2 points before & after
        have 110% or more compute
     units (~ cores)                                                             ...

 Longer bullets incomposed of of a paragraph are harder to read if
                                                                                    ...

    A compute unit is the form
                                                                                       ...
                                                              Host
                                                              (PC)
  there is insufficient line spacing. This is the maximum
     one or more processing
                                                                                             ...
     elements (~ threads)                                                                       ...
  recommended number of lines per slide (seven).                                                   ...


    Processing elements execute code as SIMD or SPMD
    Sub bullets look like this Device (GPU, CPU, …)
                            Compute
                                                                                                         ...
                                                                                                            ...
                                                                                                               ...

                                                                             Compute Devices (GPU, CPU, DSP, FPGA…)
                                Compute Unit (Core)
                   ...
                      ...
                         ...
                                Processing Element (Thread)


                                                                                                                     14
OpenCL Execution title style
Click to edit Master Model

   Kernel
   This subtitle is 20 code (~ DLL entry point)
      Basic unit of executable
                                 points
                                                                        GPU              CPU


      Data-parallel or task-parallel
    Bullets are blue
    Program                                                                    Context Queue
                                                                        Queue
   They have 110% line spacing, kernels
      Collection of kernels and functions called by 2 points before & after
      Analogous to a dynamic library (DLL)

   Commandbullets in the form of a paragraph are harder to read if
    Longer Queue
      Control
    there is operations on OpenCL objects (memory transfers,is theexecution, synchronization)
                insufficient line spacing. This kernels maximum
      Commands queued in order
    recommendedornumber of lines per slide (seven).
      Execution in-order out-of-order
      Applications may use multiple command-queues per device
     Sub bullets look like this
    Work-item
      An execution of a kernel by a processing element (~ thread)
   Work-group
      A collection of work-items that execute on a single compute unit (~ core)

                                                                                       15
OpenCL Work-group 2D analogy
Click to edit Master title style
                                            Local
   This subtitle is 20 points
    Global
   Bullets are blue
   They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide#(seven). = # pixels
                                                  work-items
      Sub bullets look like this           # work-groups = # tiles
                                            Work-group size = tileW * tileH

                                            All threads in a workgroup run
                                            synchronously
                                                                      16
OpenCL Memory Model
Click to edit Master title style

 On Host
 This subtitle    is 20 points
    CPU RAM                                      Private Memory     Private Memory        Private Memory   Private Memory


 Bullets are blue
 On Compute Device                                 Work-Item 1       Work-Item M           Work-Item 1         Work-Item M

    Global memory = GPU RAM
 They have 110% lineglobal
    Constant memory = cached
                                 spacing, 2 points before & after
                                                            Workgroup 1                              Workgroup N




 Longer bullets cached global
     memory
    Texture memory =
                       in the form of a paragraph are harder to read if
                                                  Local Memory                                                  Local Memory



  there is insufficient linereads
     memory optimized for streaming spacing. This is the maximum  Global Memory / Constant and Texture Caches


    Local memory = high-speed memory            Compute Device

  recommended number of lines per slide (seven).
     shared among work-items of a                                                        Command queues
                                                                                              and
                                                                                            API calls
      work-group (~ L1 cache)
    Sub bullets look likeof a
    Private memory = registers this                                             Host Memory

      work-item, very fast memory                Host


 Memory management is explicit
    App must move data host ➞ global ➞ local and back

                                                                                                                     17
OpenCL Kernel
Click to edit Master title style


    This subtitle isa20 points
     Defined on N-dimensional computation domain

    Bullets areis executed at each point of the
     A kernel blue
    They have 110%domain
     computation line spacing, 2 points before & after
    Longer bullets in the form of a paragraph are harder to read if
    / / I n J av aSc r i pt               / / I n OpenCL C99

     there is insufficient line spacing. This is the maximum
    f unc t i on m t i pl e( a, b, n) {
                   ul
       v ar c = [ ] ;
                                          / **
                                            * @ am a, b, c ar e buf f er s i n gl obal
                                                 par                                               m or y
                                                                                                    em
       f or ( v ar i =0; i <n; ++i )        * @ am n num
                                                 par             ber of el em   ent s i n a, b,    and c
     recommended number of lines per slide (seven).
          c [ i ] = a[ i ] * b[ i ] ;
                                            */
                                          __k er nel
       r et ur n c ;
                                          v oi d m t i pl y ( __gl obal c ons t f l oat * a,
                                                     ul
    }
        Sub bullets look like this                            __gl obal c ons t f l oat * b,
                                                               __gl obal f l oat * c ,
                                                               uns i gned i nt n)
                                          {
                                             uns i gned i nt t i d = get _gl obal _i d( 0) ;      / / t hr ead number
                                             i f ( t i d >= n) r et ur n; / / m e s ur e we
                                                                                  ak              don' t pas s buf f er ar ea
                                             c [ t i d] = a[ t i d] * b[ t i d] ;
                                          }
                                                                                                                   18
WebCL edit
Click to API Master title style
                                                  Platform layer

  OO model as OpenCL
SameThis subtitle is 20 points
                                                         WebCLPlatform        WebCLDevice       WebCLExtension
with JS classes
  Bullets object
WebCL is globalare blue WebCL
                                                         WebCLContext
  They have 110% line spacing, 2 points before & after
  Longer bullets in the form of a paragraph are harder to read if
   there is insufficient line spacing. This is the maximum
   recommended number of lines per slide (seven).
                    *
               WebCLProgram
                                              *

                                    WebCLMemoryObject
                                                                   *
                                                               CommandQueue
                                                                                    *
                                                                                        Event
                                                                                                   *
                                                                                                  Sampler
                                        {abstract}
     Sub bullets look like this
                WebCLKernel     WebCLBuffer        WebCLImage



            Compiler layer    Runtime layer
                                                                                                       19
Content edit Master title style
Click to

   Motivation and 20 points
    This subtitle is Goals

   General-Purpose computations on GPU (GPGPU)
    Bullets are blue
     From             to
 They have 110% line spacing, 2 points before & after
    The need for more general data-parallel computations
 Longer overview the form of a paragraph are harder to read if
 WebCL bullets in
  there is insufficient line spacing. This is the maximum
    A JavaScript API over OpenCL
  recommended number of lines per slide (seven).
    OpenCL concepts
     WebCL API look like this
     Sub bullets
 WebCL programming
     Pure computations
     WebGL interoperability
                                                           20
WebCL edit Master title style
Click to sequence (host side)
                                           Select          Create buffers to store
   This subtitle is 20 points
    Create context                        Platform            data on devices




                                           Select
    Bullets are blue
    Compile kernels                        Device            Create command
                                                           queues for each device


   They have 110% line spacing, 2 points before & after
    Setup command-queues                   Create
                                           Context
                                                              Update kernels


                                                                arguments

   Longerkernels in the form of a paragraph are harder to read if
    Setup bullets arguments           Load and compile
                                      kernels on devices


   there is insufficient line spacing. This is the maximum
    Execute commands
                                                           Send data to devices
                                                           using their command
                                                                  queues

    recommended number of lines per slide (seven).
   Read results                      Platform layer
                                                            Send commands to
                                                            devices using their

     Sub bullets look like this   Compiler                  command queues

                                   Runtime layer           Get data from devices
                                                           using their command
                                                                  queues



                                                             Release resources

                                                                                     21
WebCL edit Master title style
    Click to sequence (host side)

    
try {
         This subtitle is 20 points
/ / c r eat e t he OpenCL c ont ex t                                           Select
                                                                              Platform
                                                                                               Create buffers to store
                                                                                                  data on devices


    
   c l Cont ex t = W  ebCL. c r eat eCont ex t ( {
         Bullets are blue
       dev i c eTy pe: WebCL. DEVI CE_TYPE_GPU
                                                                               Select
                                                                               Device            Create command
   });                                                                                         queues for each device

}       They have 110% line spacing, 2 points before & after                  Create
c at c h( er r ) {                                                             Context

    
                                                                                                  Update kernels
         Longer bullets in the form of a paragraph are harder to read if
   t hr ow " Er r or : Fai l ed t o c r eat e c ont ex t ! " +er r ;
                                                                          Load and compile
                                                                                                    arguments

}
         there is insufficient line spacing. This is the maximum          kernels on devices
                                                                                               Send data to devices
                                                                                               using their command
v ar dev i c es = c l Cont ex t . get I nf o( WebCL. CONTEXT_DEVI CES) ;                              queues
         recommended number of lines per slide (seven).
i f ( ! dev i c es ) {
                                                                                                Send commands to
    t hr ow " Er r or : Fai l ed t o r et r i ev e c omput e dev i c es
            Sub bullets look like this
                                                                                                devices using their
              f or c ont ex t ! " ;                                                              command queues

}
                                                                                               Get data from devices
                                                                                               using their command
                                                                                                      queues



                                                                                                 Release resources

                                                                                                                         22
WebCL edit Master title style
    Click to sequence (host side)
 <scr i pt i d=" m t i pl y_scr i pt " t ype=" x- webcl " >
                    ul
 __ker nel
          This subtitle is 20 points
 voi d m t i pl y( __gl obal const f l oat * a,
            ul
                     __gl obal const f l oat * b,
                                                                                                  Select
                                                                                                 Platform
                                                                                                                  Create buffers to store
                                                                                                                     data on devices


          Bullets are blue
                     __gl obal f l oat * c,
                     unsi gned i nt n)                                                            Select
                                                                                                  Device            Create command
 {                                                                                                                queues for each device

          They have 110% line spacing, 2 points before & after
    unsi gned i nt t i d = get _gl obal _i d( 0) ; / / t hr ead num
    i f ( t i d >= n) r et ur n; / / m
                                                                   ber
                                        ake sur e we don' t pass buf f er ar ea                   Create
                                                                                                  Context

     
    c[ t i d] = a[ t i d] * b[ t i d] ;                                                                              Update kernels
 }         Longer bullets in the form of a paragraph are harder to read if                   Load and compile
                                                                                                                       arguments

</ scr i pt >
           there is insufficient line spacing. This is the maximum
/ / Cr eat e t he comput e pr ogr am f r om t he sour ce buf f er ( t ext )
                                                                                             kernels on devices
                                                                                                                  Send data to devices
                                                                                                                  using their command
cl Pr ogr am = cl Cont ext . cr eat ePr ogr am get Scour ce( " m t i pl y_scr i pt " ) ) ;
                                              (                 ul                                                       queues
           recommended number of lines per slide (seven).
/ / Bui l d t he pr ogr am execut abl e                                                                            Send commands to

              Sub bullets look like this
try {                                                                                                              devices using their
                                                                                                                    command queues
   cl Pr ogr am bui l d( cl Devi ce, ' - cl - f ast - r el axed- m h - DDEBUG=1' ) ;
                .                                                 at
} cat ch ( er r ) {
                                                                                                                  Get data from devices
   t hr ow " Er r or : Fai l ed t o bui l d pr ogr am execut abl e!  n"                                          using their command
         + c l Pr ogr am get Bui l dI nf o( c l Dev i c e, W
                        .                                   ebCL. PROGRAM_BUI LD_LOG) ;                                  queues
}

                                                                                                                    Release resources
cl Ker nel = cl Pr ogr am cr eat eKer nel ( " m t i pl y" ) ;
                         .                     ul
                                                                                                                                            23
WebCL edit Master title style
 Click to sequence (host side)

    This subtitle is 20 points
BUFFER_SI ZE=10;
v ar A=new Ui nt 32Ar r ay ( BUFFER_SI ZE) ;
                                                                                     Select
                                                                                    Platform
                                                                                                     Create buffers to store
                                                                                                        data on devices
v ar B=new Ui nt 32Ar r ay ( BUFFER_SI ZE) ;
    Bullets are blue                                                                   Select
                                                                                        Device         Create command
/ / s t or e dat a i n A and B                                                                       queues for each device
…
    They have 110% line spacing, 2 points before & after                            Create
                                                                                     Context

    Longer bullets in the form ENT; a/ paragraph are harder to read if
v ar s i z e=BUFFER_SI ZE* Ui nt 32Ar r ay . BYTES_PER_ELEM                                             Update kernels
                                                                of / s i z e i n by t es
                                                                                Load and compile
                                                                                                          arguments

/ / Cr eat e buf f er f or A and B and c opy hos t c ont ent s
v ar aBuf f er = c lis insufficient ( line M _READ_ONLY, This; is the maximum
         there Cont ex t . c r eat eBuf f er WebCL. spacing. s i z e)           kernels on devices
                                                                                                     Send data to devices
                                                             EM                                      using their command
v ar bBuf f er = c l Cont ex t . c r eat eBuf f er ( WebCL. M _READ_ONLY, s i z e) ;
                                                             EM                                             queues
         recommended number of lines per slide (seven).                                               Send commands to
/ / Cr eat e buf f er f or C t o r ead r es ul t s
          Sub bullets look like this
                                                                                                      devices using their
v ar c Buf f er = c l Cont ex t . c r eat eBuf f er ( WebCL. M _W TE_ONLY, s i z e) ;
                                                              EM RI                                    command queues


                                                                                                     Get data from devices
                                                                                                     using their command
                                                                                                            queues



                                                                                                       Release resources

                                                                                                                               24
WebCL edit Master title style
 Click to sequence (host side)

     This subtitle is 20 points
/ / Cr eat e com and queue
                m
cl Queue=cont ext . cr eat eCom andQueue( devi ces[ 0] ) ;
                               m
                                                                                      Select
                                                                                     Platform
                                                                                                      Create buffers to store
                                                                                                         data on devices


/ /  Bullets are blue
     enqueue buf f er s                                                               Select
                                                                                      Device            Create command
cl Queue. enqueueW i t eBuf f er ( aBuf f er , f al se, 0, si ze, A) ;
                      r                                                                               queues for each device
cl  They have 110% line spacing, ze, points before & after
    Queue. enqueueW i t eBuf f er ( bBuf f er , f al se, 0, si 2 B) ;
                      r                                                               Create
                                                                                      Context

     Longer bullets in the form of a paragraph are harder to read if
                                                                                                         Update kernels
                                                                                                           arguments
 / / Set ker nel ar gs                                                           Load and compile
 cl Ker nel . set Aris 0, aBuf f er ) ;
        there g( insufficient line spacing. This is the maximum                  kernels on devices
                                                                                                      Send data to devices
                                                                                                      using their command
 cl Ker nel . set Ar g( 1, bBuf f er ) ;                                                                     queues
 cl Ker nel . set Ar g( 2, cBuf f er ) number of lines per slide (seven).
        recommended ;
cl Ker nel . set Ar g( 3, BUFFER_SI ZE, WebCL. t ype. UI NT) ;                                         Send commands to

         Sub bullets look like this
                                                                                                       devices using their
                                                                                                        command queues


                                                                                                      Get data from devices
                            __ker nel
                                                                                                      using their command
                            voi d m t i pl y( __gl
                                   ul                obal   const f l oat * a,                               queues
                                              __gl   obal   const f l oat * b,
                                              __gl   obal   f l oat * c,
                                                                                                        Release resources
                                              unsi   gned   i nt n) ;
                                                                                                                                25
WebCL edit Master title style
Click to sequence (host side)

     This subtitle is 20 points                        Select
                                                       Platform
                                                                        Create buffers to store
                                                                           data on devices


/ /  Bullets are blue
     Execut e ( enqueue) ker nel                        Select
                                                        Device            Create command
cl Queue. enqueueNDRangeKer nel ( cl Ker nel ,                          queues for each device

     They have 110% line spacing,obal pointsset
                                  nul l ,           / / gl 2 wor k of f before & after
                                                        Create
                                  [ BUFFER_SI ZE] , / / gl obal wor k si ze
                                                        Context

     Longer bullets in2]the form/ /ofocal paragraph are harder to read if
                                                                           Update kernels
                                  [   );                l
                                                           a wor k si ze
                                                   Load and compile
                                                                             arguments



        there is insufficient line spacing. This is the maximum
                                                   kernels on devices
                                                                        Send data to devices
                                                                        using their command
                                                                               queues

  Note: Use local work size =number of lines per slide (seven).
        recommended [] or null (default)                                 Send commands to
to let  Sub bullets best values.
       driver chose the look like this                                   devices using their
                                                                          command queues


                                                                        Get data from devices
                                                                        using their command
                                                                               queues



                                                                          Release resources

                                                                                                  26
WebCL edit Master title style
Click to sequence (host side)

 This subtitle is 20 points                     Select
                                                Platform
                                                                 Create buffers to store
                                                                    data on devices


/ / Bulletst are bluewhi l e get t i ng t hem
      get r esul s and bl ock
                                                 Select
                                                 Device            Create command
                                                                 queues for each device

cl Queue. enqueueReadBuf f er ( lineerspacing, 2 points before & after
     They have 110% cBuf f ,
 var C=new Ui nt 32Ar r ay( BUFFER_SI ZE) ;
                                                 Create
                                                 Context

 Longer bullets in 0,r ue, ze, / / bl of a paragraph are harder to read if
                                 t                                  Update kernels
                                  the form ocki ng cal l
                                      si
                                            Load and compile
                                                                      arguments


                                 C) ;
     there is insufficient line spacing. This is the maximum
                                            kernels on devices
                                                                 Send data to devices
                                                                 using their command
                                                                        queues
     recommended number of lines per slide (seven).               Send commands to

    Sub bullets look like this
                                                                  devices using their
                                                                   command queues


                                                                 Get data from devices
                                                                 using their command
                                                                        queues



                                                                   Release resources

                                                                                           27
Example: Matrix multiplication
Click to edit Master title style
                                     A            B


   This subtitle is 20 points
    “Hello World of CL”
   Bullets are blue
   C=AxB
   They have 110% line spacing, 2 points before & after


    N x N matrices form of a paragraph are harder to read if
    Longer bullets in the
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this            C




                                                        28
Example: Matrix multiplication
Click to edit Master title style
                                            A              B


   This subtitle is 20 points
    Optimization
   Bullets are blue
      N x N matrices
   They have 110% line spacing, 2 points before & after
      C divided into m x m tiles
   Longer bullets in the form of a paragraph are harder to read if
      With
    there is insufficient line spacing. This is the maximum
        • m=N/P
    recommended number of lines per slide (seven).
       • bullets look like this
     SubP = # threads per workgroup (16)           C




                                                               29
Example: Comparison with sequential
Click to edit Master title style

 MacBook Pro (early 2011), OSX 10.8
 This subtitle is 20 points
    CPU:
 BulletsIntel Core i7, 2.2GHz, 4 cores
          are blue
    GPU: AMD Radeon HD 6750M, 1 GB, 480 SPU, 600 MHz, 576 GFLOPS
 They have 110% line spacing, 2 points before & after
     250


 Longer bullets in the form of a paragraph are harder to read if
     200

  there is insufficient line spacing. This is the maximum
    Speedup factor




     150                                                       OpenMP
  recommended number of lines per slide (seven). CL (CPU)
                     100                                   CL (GPU)
    Sub bullets look like this
                                                           CL (GPU opt)
                      50


                       0
                           128   256   512   1024   2048
                                                                          30
WebCL WebGL interop
Click to /edit Master title style

 WebCL context created
 This subtitle is 20 points                        Initialization
                                                      Initialize WebGL


  from WebGL context
 Bullets are blue CL objects
 Configure shared                                    Initialize WebCL



 They GL counterparts spacing, 2 points before & after
  from have 110% line                              Configure shared CL-GL


 Sync GL bullets in the form of a paragraph are harder to read if
                                                           data


 Longer and CL                                                Rendering loop
    Flush GL, acquire GL object              Set kernels args

  there is insufficient line spacing. This is the maximum
    Execute CL
                                                                 (per frame)


  recommended number of lines per slide (seven).
    Release CL object, flush CL            Enqueue commands




    Sub bullets look like this
 Vertex arrays, textures,                            Execute kernels



  render-buffers can be shared                         Update Scene

  with CL
                                                       Render scene

                                                                                31
WebCL WebGL interop
 Click to /edit Master title style
/ / Cr eat e WebGL c ont ex t                                              Initialize WebGL

     This subtitle is 20 points
v ar gl = c anv as . get Cont ex t ( " ex per i ment al - webgl " ) ;

/ / I ni t GL
                                                                           Initialize WebCL
          Bullets are blue
…
     They have 110% line spacing, 2 points before & after              Configure shared CL-GL
                                                                                data

/ / c r eat e t he OpenCL c ont ex t
t r  { Longer bullets in the form of a paragraph are harder to read if
    y                                                                      Set kernels args


          there is insufficient line {spacing. This is the maximum
    c l Cont ex t = W  ebCL. c r eat eCont ex t (
        dev i c eTy pe: WebCL. DEVI CE_TYPE_GPU,
        s recommended number of lines per slide (seven).
                                                                         Enqueue commands
          har eGr oup: gl
  });
}          Sub bullets look like this                                     Execute kernels


c at c h( er r ) {
   t hr ow " Er r or : Fai l ed t o c r eat e c ont ex t ! " +er r ;        Update Scene

}
                                                                            Render scene

                                                                                                32
WebCL WebGL interop (texture)
     Click to /edit Master title style
//   Cr eat e OpenGL t ext ur e obj ect
gl . act i veText ur e( gl . TEXTURE0) ;                                                      Initialize WebGL

gl
gl
     This subtitle is 20 points
   Text ur e = gl . cr eat eText ur e( ) ;
   . bi ndText ur e( gl . TEXTURE_2D, gl Text ur e) ;
gl
gl   Bullets are blue
   . t exPar am er i ( gl . TEXTURE_2D, gl . TEXTURE_M
                 et                                   AG_FI LTER, gl . NEAREST) ;
   . t exPar am er i ( gl . TEXTURE_2D, gl . TEXTURE_M N_FI LTER, gl . NEAREST) ;
                 et                                   I
                                                                                              Initialize WebCL


gl . t exI mage2D( gl . TEXTURE_2D, 0, gl . RGBA, Text ur eW dt h, Text ur eHei ght , 0,
                                                            i
     They have 110% line spacing, 2 points before & after
           gl . RGBA, gl . UNSI GNED_BYTE, nul l ) ;
gl . bi ndText ur e( gl . TEXTURE_2D, nul l ) ;
                                                                                           Configure shared CL-GL
                                                                                                   data



     Longerput e pr ogr aminom t he formbuf f era( paragraph are harder to read if
/ / Cr eat e t he com
                        bullets f r the sour ce of t ext )                                    Set kernels args

cl Pr ogr there isext . cr eat ePr ogr am get Scourspacing. This "is ; the maximum
          am = cl Cont insufficient line ce( " m t i pl y_scr i pt ) )
                                         (            ul

/ / Bui l recommended number of lines per slide (seven).
                                                                                            Enqueue commands
          d t he pr ogr am execut abl e
try {
            Sub bullets look like this
  cl Pr ogr am bui l d( cl Devi ce, ' - cl - f ast - r el axed- m h - DDEBUG=1' ) ;
               .
} cat ch ( er r ) {
                                                                 at
                                                                                              Execute kernels


  t hr ow " Er r or : Fai l ed t o bui l d pr ogr am execut abl e!  n"
        + c l Pr ogr am get Bui l dI nf o( c l Dev i c e, W
                       .                                   ebCL. PROGRAM_BUI LD_LOG) ;         Update Scene
}

cl Ker nel = cl Pr ogr am cr eat eKer nel ( " m t i pl y" ) ;
                         .                     ul                                              Render scene

                                                                                                                   33
Demo: GL Texture update with
Click to edit Master title style CL


   This subtitleEvgeny Demidov 2D ink droplet
    Based on is 20 points
   Bullets are fps
      WebGL ~26 blue                     WebCL ~124 fps
   They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this



                                                               34
WebCL WebGL interop (vbo)
 Click to /edit Master title style
/ / cr eat e buf f er obj ect                                            Initialize WebGL

     This subtitle is 20 points
gl VBO = gl . cr eat eBuf f er ( ) ;
gl . bi ndBuf f er ( gl . ARRAY_BUFFER, gl VBO) ;

/ /  ni Bullets are blue
                                                                         Initialize WebCL
     i   t i al i ze buf f er obj ect
var si zeI nByt es = m      esh_wi dt h * m    esh_hei ght * 4 *
     They have 110% line spacing, 2 points before & after
                           Fl oat Ar r ay . BYTES_PER_ELEM    ENT;
                                                                      Configure shared CL-GL
                                                                              data
gl . buf f er Dat a( gl . ARRAY_BUFFER, si zeI nByt es, gl . DYNAM C_DRAW ;
                                                                     I      )
     Longer bullets in the form of a paragraph are harder to read if
/ / cr eat e OpenCL buf f er f r om GL VBO                               Set kernels args

cl VBO there ext . insufficient line spacing. This VBO) the maximum
         = cl Cont is cr eat eFr om       GLBuf f er ( WebCL. M _W TE_ONLY, gl is ;
                                                               EM RI

         recommended number of lines per slide (seven).                Enqueue commands



//    set ker nel ar gs       val ues
cl          Sub bullets look like this
     Ker nel . set Ar g( 0,
                     cl VBO) ;                                           Execute kernels

cl   Ker nel . set Ar g( 1,   mesh_wi dt h, WebCL. t ype. UI NT) ;
cl   Ker nel . set Ar g( 2,   mesh_hei ght , WebCL. t ype. UI NT) ;       Update Scene




                                                                          Render scene

                                                                                              35
Click to edit Master title style

   This subtitle is 20 points
   Bullets are blue
   They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this



                                                               36
WebCL/WebGL interop style
    Click to edit Master title(host side)

    This subtitle is 20 points
                                                                        Initialize WebGL
/ / Sy nc GL and ac qui r e buf f er f r om GL
gl . f l us h( ) ;
    Bullets are blue
c l Queue. enqueueAc qui r eGLObj ec t s ( c l Tex t ur e) ;            Initialize WebCL




    They have 110% line spacing, 2 points before & after
/ / Set gl obal and l oc al wor k s i z es f or k er nel
v ar l oc al = nul l ;
                                                                     Configure shared CL-GL
                                                                             data
v ar gl obal = [ Tex t ur eW dt h, Tex t ur eHei ght ] ;
                              i
    Longer bullets in the form of a paragraph are harder to read if    Set kernels args
try {
   c l Queue. enqueueNDRangeKer nel ( c l Ker nel , nul l , gl obal , l This is the maximum
         there is insufficient line spacing. oc al ) ;
} c at c h ( er r ) {
   t hr ow " Fai l ed t o enqueue k er nel ! " + er r ;of lines per slide (seven).
         recommended number                                           Enqueue commands

}
          Sub bullets look like this
/ / Rel eas e GL t ex t ur e
                                                                        Execute kernels

c l Queue. enqueueRel eas eGLObj ec t s ( c l Tex t ur e) ;
c l Queue. f l us h( ) ;                                                 Update Scene




                                                                         Render scene

                                                                                             37
Click to edit Master title style

   This subtitle is 20 points
   Bullets are blue
   They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this



                                                               38
Perspectives
Click to edit Master title style

 This subtitle is 20 points applications in Web browsers
  WebCL enables GPGPU
 Bullets are usage of architecture can lead to impressive
    Careful blue
 They have 110% line spacing, 2 points before & after
     speedup
 Longer bullets ininteroperability, rich graphicsharder to read if
    With WebGL the form of a paragraph are Web
  there is insufficient now spacing. This is the maximum
     applications are line possible
 recommended number of lines per slide (seven).
  DRAFT WebCL specification
     Sub bullets look like this
     Quite stable JavaScript API
     Focusing on more security and robustness
                                                               39
WebCL edit Master title style
Click to Open process and Resources

 Khronos open process points Web community
 This subtitle is 20 to engage
    Public specification
 Bullets are blue drafts, mailing lists, forums
    http://www.khronos.org/webcl/
 They have 110% line spacing, 2 points before & after
    webcl_public@khronos.org
 Longer bullets in the form of a paragraph are harder to read if
 Nokia open source prototype for Firefox in May 2011 (LGPL)
  there is insufficient line spacing. This is the maximum
    http://webcl.nokiaresearch.com
 recommended number of lines per in July (seven).
  Samsung open source prototype for WebKit slide 2011 (BSD)
    Sub bullets look like this
     http://code.google.com/p/webcl/
 Motorola open source prototype for NodeJS in March 2012 (BSD)
    https://github.com/Motorola-Mobility/node-webcl
                                                                  40
Click to edit Master title style

   This subtitle is 20 points
   Bullets are blue
   They have 110% line spacing, 2 points before & after
                             Thaank
    Longer bullets in the form of paragraph are harder to read if
                                 you!
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this



                                                             41
Click to edit Master title style
        This slide has a 16:9 media window
   This subtitle is 20 points
   Bullets are blue
   They have 110% line spacing, 2 points before & after
   Longer bullets in the form of a paragraph are harder to read if
    there is insufficient line spacing. This is the maximum
    recommended number of lines per slide (seven).
     Sub bullets look like this



                                                               42
Start to edit Master
Click learning Now! title style


   OpenCL Programming Guide - The “Red Book” of OpenCL
    This subtitle is 20 points
      http://www.amazon.com/OpenCL-Programming-Guide-Aaftab-Munshi/dp/0321749642

   OpenCL in Action blue
    Bullets are
      http://www.amazon.com/OpenCL-Action-Accelerate-Graphics-Computations/dp/1617290173/

   They have 110% line spacing, 2 points before & after
    Heterogeneous Computing with OpenCL
      http://www.amazon.com/Heterogeneous-Computing-with-OpenCL-ebook/dp/B005JRHYUS


    LongerProgramming Bookthe form of a paragraph are harder to read if
    The OpenCL
                bullets in
    there is insufficient line spacing. This is the maximum
      http://www.fixstars.com/en/opencl/book/

    recommended number of lines per slide (seven).
      Sub bullets look like this



                                                                                    43

Contenu connexe

Dernier

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 

Dernier (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 

Graphics Programming for the Web with WebCL

  • 1. Click to edit Master title style  This subtitle is 20 points  Bullets are blue  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this 1
  • 2. Click to edit Master title style  This subtitle is 20 points  Bullets are blue  Graphics Programming on the Web They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if with WebCL there is insufficient line spacing.Motorola the maximum Mikaël Bourges-Sévenier, This is Mobility recommended number of lines 2012 slide (seven). August 9, per  Sub bullets look like this 2
  • 3. Click to edit Master title style   This subtitle is planks ;-) Blender/Bullet/SmallLuxGPU Over 32000 20 points  Bullets are blue OpenCL  They have 110% line spacing, 2 points before & after  By Alain Ducharme “Phymec”  Longer bullets in the form of a paragraph are harder to read if http://www.youtube.com/watch?v=143k1fqPukk there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this 3
  • 4. Motivation Click to edit Master title style   For compute intensive web applications This subtitle is 20 points  Bullets are blue  Games: physics, special effects  They have 110% linephotography  Computational spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if  Scientific simulations there is insufficient line spacing. This is the maximum  Augmented reality recommended number of lines per slide (seven).  … bullets look like this  Sub  Use many devices for general computations  CPU, GPU, DSP, FPGA… 4
  • 5. Motivation Click to edit Master title style   This subtitle is 20 exponential GFLOPS growth every GPUs provide points Chapter 1. Introduction  Bullets areCPUs year vs. blue  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this NVidia CUDA/OpenCL C programming guide 5
  • 6. Content edit Master title style Click to  Motivation and 20 points This subtitle is Goals   General-Purpose computations on GPU (GPGPU) Bullets are blue  From to  They have 110% line spacing, 2 points before & after  The need for more general data-parallel computations  Longer overview the form of a paragraph are harder to read if  WebCL bullets in there is insufficient line spacing. This is the maximum  A JavaScript API over OpenCL recommended number of lines per slide (seven).  OpenCL concepts  WebCL API look like this  Sub bullets  WebCL programming  Pure computations  WebGL interoperability 6
  • 7. Content edit Master title style Click to  Motivation and 20 points This subtitle is Goals   General-Purpose computations on GPU (GPGPU) Bullets are blue  From to  They have 110% line spacing, 2 points before & after  The need for more general data-parallel computations  Longer overview the form of a paragraph are harder to read if  WebCL bullets in there is insufficient line spacing. This is the maximum  A JavaScript API over OpenCL recommended number of lines per slide (seven).  OpenCL concepts  WebCL API look like this  Sub bullets  WebCL programming  Pure computations  WebGL interoperability 7
  • 8. WebGL edit Master title style Click to pipeline  Programmable vertex & This subtitle is 20 points fragment shaders  Bullets are blue Application GPU Frame Buffer  They have 110% line spacing, 2 points before & after vertex  fragment Longer bullets in the form of a paragraph are harder to read if vertices (3D) vertices (screen) fragments pixels Vertex Fragment there is insufficient line spacing. This is the maximum processing Rasterizer processing recommended number of lines per slide (seven).  Sub bullets look like this Vertex Shader Textures Samplers Fragment Shader 8
  • 9. General Purpose computations Click to edit Master title style on GPU  With clever 20 points  This subtitle ismapping of algorithms to GL pipeline   Textures as data buffers Bullets are blue   Texture coordinates as computational domain They have 110% line spacing, 2 points before & after  Vertex coordinates as computational range  Longer bullets in the form of a paragraph are harder to read if  Vertex shaders Scatter (write values) there is insufficient line spacing. This is the maximum • to start computations recommended number of lines per slide (seven). • scatter operations  Sub bullets look like this Fragment shaders Gather (read values) • for algorithms steps • gather operations 9
  • 10. GPGPU with GL limitations Click to edit Master title style   This subtitle is 20 points Hard to map algorithms to graphics pipeline  Bullets are blue  Hard to do scatter operations  They have 110% line spacing, 2 points before & after   Shader instancesform of a paragraph are harder to read if Longer bullets in the can NOT directly communicate with is insufficient line spacing. This is the maximum there one another recommendedGPGPU of linesGL is hack-ish … number with per slide (seven).  Sub bullets look like this  CL is made for GPGPU, not graphics 10
  • 11. Content edit Master title style Click to  Motivation and 20 points This subtitle is Goals   General-Purpose computations on GPU (GPGPU) Bullets are blue  From to  They have 110% line spacing, 2 points before & after  The need for more general data-parallel computations  Longer overview the form of a paragraph are harder to read if  WebCL bullets in there is insufficient line spacing. This is the maximum  A JavaScript API over OpenCL recommended number of lines per slide (seven).  OpenCL concepts  WebCL API look like this  Sub bullets  WebCL programming  Pure computations  WebGL interoperability 11
  • 12. WebCL edit Master Click to overview title style  WebCL brings parallel computing to This subtitle is 20 points  the Web through a secure Bullets are blue JavaScript binding to OpenCL 1.1  They have 110% line spacing, 2 points before & after (2011)  Longer bullets inroyalty-freeof a paragraph are harder to read if  Open standard, the form there is insufficient line spacing. This is the maximum  Platform independent recommended number of lines per slide (seven).  Device independent  being standardized by Khronos  Sub bullets look like this  First public working draft April 2012  http://www.khronos.org/webcl/ 12
  • 13. OpenCL overview Click to edit Master title style  Features  This subtitle is 20 points  C-based cross-platform API  Bullets are blue  Kernels use a subset of C99 and extensions  They have 110% line spacing, 2 points before & after • Vector extensions (<type>N) • No recursion, no function pointers  Longer bullets memory (malloc,of a paragraph libc methods (memcpy…) if • No dynamic in the form free…), no standard are harder to read there is insufficient lineaccuracy both for intergers and floats  Well-defined numerical spacing. This is the maximum  Rich-set of built-in functions (e.g. as GLSL and more) recommended number of lines per slide (seven). • But no random method  Sub bullets look like this  Close to the hardware • Control over memory use • Control over thread scheduling 13
  • 14. OpenCL Device Model Click to edit Master title style  This subtitle is 20 points or more Compute devices A host is connected to one  Compute device  Bullets are blue ... ...  A ...  Theycollection of oneline spacing, 2 points before & after have 110% or more compute units (~ cores) ...  Longer bullets incomposed of of a paragraph are harder to read if ...  A compute unit is the form ... Host (PC) there is insufficient line spacing. This is the maximum one or more processing ... elements (~ threads) ... recommended number of lines per slide (seven). ...  Processing elements execute code as SIMD or SPMD  Sub bullets look like this Device (GPU, CPU, …) Compute ... ... ... Compute Devices (GPU, CPU, DSP, FPGA…) Compute Unit (Core) ... ... ... Processing Element (Thread) 14
  • 15. OpenCL Execution title style Click to edit Master Model  Kernel  This subtitle is 20 code (~ DLL entry point)  Basic unit of executable points GPU CPU    Data-parallel or task-parallel Bullets are blue Program Context Queue Queue  They have 110% line spacing, kernels  Collection of kernels and functions called by 2 points before & after  Analogous to a dynamic library (DLL)   Commandbullets in the form of a paragraph are harder to read if Longer Queue  Control there is operations on OpenCL objects (memory transfers,is theexecution, synchronization) insufficient line spacing. This kernels maximum  Commands queued in order recommendedornumber of lines per slide (seven).  Execution in-order out-of-order  Applications may use multiple command-queues per device   Sub bullets look like this Work-item  An execution of a kernel by a processing element (~ thread)  Work-group  A collection of work-items that execute on a single compute unit (~ core) 15
  • 16. OpenCL Work-group 2D analogy Click to edit Master title style Local  This subtitle is 20 points Global  Bullets are blue  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide#(seven). = # pixels work-items  Sub bullets look like this # work-groups = # tiles Work-group size = tileW * tileH All threads in a workgroup run synchronously 16
  • 17. OpenCL Memory Model Click to edit Master title style  On Host  This subtitle is 20 points  CPU RAM Private Memory Private Memory Private Memory Private Memory  Bullets are blue  On Compute Device Work-Item 1 Work-Item M Work-Item 1 Work-Item M  Global memory = GPU RAM  They have 110% lineglobal  Constant memory = cached spacing, 2 points before & after Workgroup 1 Workgroup N  Longer bullets cached global memory  Texture memory = in the form of a paragraph are harder to read if Local Memory Local Memory there is insufficient linereads memory optimized for streaming spacing. This is the maximum Global Memory / Constant and Texture Caches  Local memory = high-speed memory Compute Device recommended number of lines per slide (seven). shared among work-items of a Command queues and API calls work-group (~ L1 cache)  Sub bullets look likeof a  Private memory = registers this Host Memory work-item, very fast memory Host  Memory management is explicit  App must move data host ➞ global ➞ local and back 17
  • 18. OpenCL Kernel Click to edit Master title style   This subtitle isa20 points Defined on N-dimensional computation domain   Bullets areis executed at each point of the A kernel blue  They have 110%domain computation line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if / / I n J av aSc r i pt / / I n OpenCL C99 there is insufficient line spacing. This is the maximum f unc t i on m t i pl e( a, b, n) { ul v ar c = [ ] ; / ** * @ am a, b, c ar e buf f er s i n gl obal par m or y em f or ( v ar i =0; i <n; ++i ) * @ am n num par ber of el em ent s i n a, b, and c recommended number of lines per slide (seven). c [ i ] = a[ i ] * b[ i ] ; */ __k er nel r et ur n c ; v oi d m t i pl y ( __gl obal c ons t f l oat * a, ul }  Sub bullets look like this __gl obal c ons t f l oat * b, __gl obal f l oat * c , uns i gned i nt n) { uns i gned i nt t i d = get _gl obal _i d( 0) ; / / t hr ead number i f ( t i d >= n) r et ur n; / / m e s ur e we ak don' t pas s buf f er ar ea c [ t i d] = a[ t i d] * b[ t i d] ; } 18
  • 19. WebCL edit Click to API Master title style Platform layer  OO model as OpenCL SameThis subtitle is 20 points WebCLPlatform WebCLDevice WebCLExtension with JS classes  Bullets object WebCL is globalare blue WebCL WebCLContext  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven). * WebCLProgram * WebCLMemoryObject * CommandQueue * Event * Sampler {abstract}  Sub bullets look like this WebCLKernel WebCLBuffer WebCLImage Compiler layer Runtime layer 19
  • 20. Content edit Master title style Click to  Motivation and 20 points This subtitle is Goals   General-Purpose computations on GPU (GPGPU) Bullets are blue  From to  They have 110% line spacing, 2 points before & after  The need for more general data-parallel computations  Longer overview the form of a paragraph are harder to read if  WebCL bullets in there is insufficient line spacing. This is the maximum  A JavaScript API over OpenCL recommended number of lines per slide (seven).  OpenCL concepts  WebCL API look like this  Sub bullets  WebCL programming  Pure computations  WebGL interoperability 20
  • 21. WebCL edit Master title style Click to sequence (host side) Select Create buffers to store  This subtitle is 20 points Create context Platform data on devices  Select Bullets are blue Compile kernels Device Create command queues for each device  They have 110% line spacing, 2 points before & after Setup command-queues Create Context Update kernels  arguments  Longerkernels in the form of a paragraph are harder to read if Setup bullets arguments Load and compile kernels on devices  there is insufficient line spacing. This is the maximum Execute commands Send data to devices using their command queues recommended number of lines per slide (seven).  Read results Platform layer Send commands to devices using their  Sub bullets look like this Compiler command queues Runtime layer Get data from devices using their command queues Release resources 21
  • 22. WebCL edit Master title style Click to sequence (host side)  try { This subtitle is 20 points / / c r eat e t he OpenCL c ont ex t Select Platform Create buffers to store data on devices  c l Cont ex t = W ebCL. c r eat eCont ex t ( { Bullets are blue dev i c eTy pe: WebCL. DEVI CE_TYPE_GPU Select Device Create command }); queues for each device }  They have 110% line spacing, 2 points before & after Create c at c h( er r ) { Context  Update kernels Longer bullets in the form of a paragraph are harder to read if t hr ow " Er r or : Fai l ed t o c r eat e c ont ex t ! " +er r ; Load and compile arguments } there is insufficient line spacing. This is the maximum kernels on devices Send data to devices using their command v ar dev i c es = c l Cont ex t . get I nf o( WebCL. CONTEXT_DEVI CES) ; queues recommended number of lines per slide (seven). i f ( ! dev i c es ) { Send commands to t hr ow " Er r or : Fai l ed t o r et r i ev e c omput e dev i c es  Sub bullets look like this devices using their f or c ont ex t ! " ; command queues } Get data from devices using their command queues Release resources 22
  • 23. WebCL edit Master title style Click to sequence (host side) <scr i pt i d=" m t i pl y_scr i pt " t ype=" x- webcl " > ul __ker nel  This subtitle is 20 points voi d m t i pl y( __gl obal const f l oat * a, ul __gl obal const f l oat * b, Select Platform Create buffers to store data on devices  Bullets are blue __gl obal f l oat * c, unsi gned i nt n) Select Device Create command { queues for each device  They have 110% line spacing, 2 points before & after unsi gned i nt t i d = get _gl obal _i d( 0) ; / / t hr ead num i f ( t i d >= n) r et ur n; / / m ber ake sur e we don' t pass buf f er ar ea Create Context  c[ t i d] = a[ t i d] * b[ t i d] ; Update kernels } Longer bullets in the form of a paragraph are harder to read if Load and compile arguments </ scr i pt > there is insufficient line spacing. This is the maximum / / Cr eat e t he comput e pr ogr am f r om t he sour ce buf f er ( t ext ) kernels on devices Send data to devices using their command cl Pr ogr am = cl Cont ext . cr eat ePr ogr am get Scour ce( " m t i pl y_scr i pt " ) ) ; ( ul queues recommended number of lines per slide (seven). / / Bui l d t he pr ogr am execut abl e Send commands to  Sub bullets look like this try { devices using their command queues cl Pr ogr am bui l d( cl Devi ce, ' - cl - f ast - r el axed- m h - DDEBUG=1' ) ; . at } cat ch ( er r ) { Get data from devices t hr ow " Er r or : Fai l ed t o bui l d pr ogr am execut abl e! n" using their command + c l Pr ogr am get Bui l dI nf o( c l Dev i c e, W . ebCL. PROGRAM_BUI LD_LOG) ; queues } Release resources cl Ker nel = cl Pr ogr am cr eat eKer nel ( " m t i pl y" ) ; . ul 23
  • 24. WebCL edit Master title style Click to sequence (host side)  This subtitle is 20 points BUFFER_SI ZE=10; v ar A=new Ui nt 32Ar r ay ( BUFFER_SI ZE) ; Select Platform Create buffers to store data on devices v ar B=new Ui nt 32Ar r ay ( BUFFER_SI ZE) ;  Bullets are blue Select Device Create command / / s t or e dat a i n A and B queues for each device …  They have 110% line spacing, 2 points before & after Create Context  Longer bullets in the form ENT; a/ paragraph are harder to read if v ar s i z e=BUFFER_SI ZE* Ui nt 32Ar r ay . BYTES_PER_ELEM Update kernels of / s i z e i n by t es Load and compile arguments / / Cr eat e buf f er f or A and B and c opy hos t c ont ent s v ar aBuf f er = c lis insufficient ( line M _READ_ONLY, This; is the maximum there Cont ex t . c r eat eBuf f er WebCL. spacing. s i z e) kernels on devices Send data to devices EM using their command v ar bBuf f er = c l Cont ex t . c r eat eBuf f er ( WebCL. M _READ_ONLY, s i z e) ; EM queues recommended number of lines per slide (seven). Send commands to / / Cr eat e buf f er f or C t o r ead r es ul t s  Sub bullets look like this devices using their v ar c Buf f er = c l Cont ex t . c r eat eBuf f er ( WebCL. M _W TE_ONLY, s i z e) ; EM RI command queues Get data from devices using their command queues Release resources 24
  • 25. WebCL edit Master title style Click to sequence (host side)  This subtitle is 20 points / / Cr eat e com and queue m cl Queue=cont ext . cr eat eCom andQueue( devi ces[ 0] ) ; m Select Platform Create buffers to store data on devices / /  Bullets are blue enqueue buf f er s Select Device Create command cl Queue. enqueueW i t eBuf f er ( aBuf f er , f al se, 0, si ze, A) ; r queues for each device cl  They have 110% line spacing, ze, points before & after Queue. enqueueW i t eBuf f er ( bBuf f er , f al se, 0, si 2 B) ; r Create Context  Longer bullets in the form of a paragraph are harder to read if Update kernels arguments / / Set ker nel ar gs Load and compile cl Ker nel . set Aris 0, aBuf f er ) ; there g( insufficient line spacing. This is the maximum kernels on devices Send data to devices using their command cl Ker nel . set Ar g( 1, bBuf f er ) ; queues cl Ker nel . set Ar g( 2, cBuf f er ) number of lines per slide (seven). recommended ; cl Ker nel . set Ar g( 3, BUFFER_SI ZE, WebCL. t ype. UI NT) ; Send commands to  Sub bullets look like this devices using their command queues Get data from devices __ker nel using their command voi d m t i pl y( __gl ul obal const f l oat * a, queues __gl obal const f l oat * b, __gl obal f l oat * c, Release resources unsi gned i nt n) ; 25
  • 26. WebCL edit Master title style Click to sequence (host side)  This subtitle is 20 points Select Platform Create buffers to store data on devices / /  Bullets are blue Execut e ( enqueue) ker nel Select Device Create command cl Queue. enqueueNDRangeKer nel ( cl Ker nel , queues for each device  They have 110% line spacing,obal pointsset nul l , / / gl 2 wor k of f before & after Create [ BUFFER_SI ZE] , / / gl obal wor k si ze Context  Longer bullets in2]the form/ /ofocal paragraph are harder to read if Update kernels [ ); l a wor k si ze Load and compile arguments there is insufficient line spacing. This is the maximum kernels on devices Send data to devices using their command queues Note: Use local work size =number of lines per slide (seven). recommended [] or null (default) Send commands to to let  Sub bullets best values. driver chose the look like this devices using their command queues Get data from devices using their command queues Release resources 26
  • 27. WebCL edit Master title style Click to sequence (host side)  This subtitle is 20 points Select Platform Create buffers to store data on devices / / Bulletst are bluewhi l e get t i ng t hem get r esul s and bl ock Select Device Create command queues for each device cl Queue. enqueueReadBuf f er ( lineerspacing, 2 points before & after They have 110% cBuf f , var C=new Ui nt 32Ar r ay( BUFFER_SI ZE) ; Create Context  Longer bullets in 0,r ue, ze, / / bl of a paragraph are harder to read if t Update kernels the form ocki ng cal l si Load and compile arguments C) ; there is insufficient line spacing. This is the maximum kernels on devices Send data to devices using their command queues recommended number of lines per slide (seven). Send commands to  Sub bullets look like this devices using their command queues Get data from devices using their command queues Release resources 27
  • 28. Example: Matrix multiplication Click to edit Master title style A B   This subtitle is 20 points “Hello World of CL”  Bullets are blue  C=AxB  They have 110% line spacing, 2 points before & after   N x N matrices form of a paragraph are harder to read if Longer bullets in the there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this C 28
  • 29. Example: Matrix multiplication Click to edit Master title style A B   This subtitle is 20 points Optimization  Bullets are blue  N x N matrices  They have 110% line spacing, 2 points before & after  C divided into m x m tiles  Longer bullets in the form of a paragraph are harder to read if  With there is insufficient line spacing. This is the maximum • m=N/P recommended number of lines per slide (seven). • bullets look like this  SubP = # threads per workgroup (16) C 29
  • 30. Example: Comparison with sequential Click to edit Master title style  MacBook Pro (early 2011), OSX 10.8  This subtitle is 20 points  CPU:  BulletsIntel Core i7, 2.2GHz, 4 cores are blue  GPU: AMD Radeon HD 6750M, 1 GB, 480 SPU, 600 MHz, 576 GFLOPS  They have 110% line spacing, 2 points before & after 250  Longer bullets in the form of a paragraph are harder to read if 200 there is insufficient line spacing. This is the maximum Speedup factor 150 OpenMP recommended number of lines per slide (seven). CL (CPU) 100 CL (GPU)  Sub bullets look like this CL (GPU opt) 50 0 128 256 512 1024 2048 30
  • 31. WebCL WebGL interop Click to /edit Master title style  WebCL context created  This subtitle is 20 points Initialization Initialize WebGL from WebGL context  Bullets are blue CL objects  Configure shared Initialize WebCL  They GL counterparts spacing, 2 points before & after from have 110% line Configure shared CL-GL  Sync GL bullets in the form of a paragraph are harder to read if data  Longer and CL Rendering loop  Flush GL, acquire GL object Set kernels args there is insufficient line spacing. This is the maximum  Execute CL (per frame) recommended number of lines per slide (seven).  Release CL object, flush CL Enqueue commands  Sub bullets look like this  Vertex arrays, textures, Execute kernels render-buffers can be shared Update Scene with CL Render scene 31
  • 32. WebCL WebGL interop Click to /edit Master title style / / Cr eat e WebGL c ont ex t Initialize WebGL  This subtitle is 20 points v ar gl = c anv as . get Cont ex t ( " ex per i ment al - webgl " ) ; / / I ni t GL Initialize WebCL Bullets are blue …  They have 110% line spacing, 2 points before & after Configure shared CL-GL data / / c r eat e t he OpenCL c ont ex t t r  { Longer bullets in the form of a paragraph are harder to read if y Set kernels args there is insufficient line {spacing. This is the maximum c l Cont ex t = W ebCL. c r eat eCont ex t ( dev i c eTy pe: WebCL. DEVI CE_TYPE_GPU, s recommended number of lines per slide (seven). Enqueue commands har eGr oup: gl }); }  Sub bullets look like this Execute kernels c at c h( er r ) { t hr ow " Er r or : Fai l ed t o c r eat e c ont ex t ! " +er r ; Update Scene } Render scene 32
  • 33. WebCL WebGL interop (texture) Click to /edit Master title style // Cr eat e OpenGL t ext ur e obj ect gl . act i veText ur e( gl . TEXTURE0) ; Initialize WebGL gl gl  This subtitle is 20 points Text ur e = gl . cr eat eText ur e( ) ; . bi ndText ur e( gl . TEXTURE_2D, gl Text ur e) ; gl gl  Bullets are blue . t exPar am er i ( gl . TEXTURE_2D, gl . TEXTURE_M et AG_FI LTER, gl . NEAREST) ; . t exPar am er i ( gl . TEXTURE_2D, gl . TEXTURE_M N_FI LTER, gl . NEAREST) ; et I Initialize WebCL gl . t exI mage2D( gl . TEXTURE_2D, 0, gl . RGBA, Text ur eW dt h, Text ur eHei ght , 0, i  They have 110% line spacing, 2 points before & after gl . RGBA, gl . UNSI GNED_BYTE, nul l ) ; gl . bi ndText ur e( gl . TEXTURE_2D, nul l ) ; Configure shared CL-GL data  Longerput e pr ogr aminom t he formbuf f era( paragraph are harder to read if / / Cr eat e t he com bullets f r the sour ce of t ext ) Set kernels args cl Pr ogr there isext . cr eat ePr ogr am get Scourspacing. This "is ; the maximum am = cl Cont insufficient line ce( " m t i pl y_scr i pt ) ) ( ul / / Bui l recommended number of lines per slide (seven). Enqueue commands d t he pr ogr am execut abl e try {  Sub bullets look like this cl Pr ogr am bui l d( cl Devi ce, ' - cl - f ast - r el axed- m h - DDEBUG=1' ) ; . } cat ch ( er r ) { at Execute kernels t hr ow " Er r or : Fai l ed t o bui l d pr ogr am execut abl e! n" + c l Pr ogr am get Bui l dI nf o( c l Dev i c e, W . ebCL. PROGRAM_BUI LD_LOG) ; Update Scene } cl Ker nel = cl Pr ogr am cr eat eKer nel ( " m t i pl y" ) ; . ul Render scene 33
  • 34. Demo: GL Texture update with Click to edit Master title style CL   This subtitleEvgeny Demidov 2D ink droplet Based on is 20 points  Bullets are fps WebGL ~26 blue WebCL ~124 fps  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this 34
  • 35. WebCL WebGL interop (vbo) Click to /edit Master title style / / cr eat e buf f er obj ect Initialize WebGL  This subtitle is 20 points gl VBO = gl . cr eat eBuf f er ( ) ; gl . bi ndBuf f er ( gl . ARRAY_BUFFER, gl VBO) ; / /  ni Bullets are blue Initialize WebCL i t i al i ze buf f er obj ect var si zeI nByt es = m esh_wi dt h * m esh_hei ght * 4 *  They have 110% line spacing, 2 points before & after Fl oat Ar r ay . BYTES_PER_ELEM ENT; Configure shared CL-GL data gl . buf f er Dat a( gl . ARRAY_BUFFER, si zeI nByt es, gl . DYNAM C_DRAW ; I )  Longer bullets in the form of a paragraph are harder to read if / / cr eat e OpenCL buf f er f r om GL VBO Set kernels args cl VBO there ext . insufficient line spacing. This VBO) the maximum = cl Cont is cr eat eFr om GLBuf f er ( WebCL. M _W TE_ONLY, gl is ; EM RI recommended number of lines per slide (seven). Enqueue commands // set ker nel ar gs val ues cl  Sub bullets look like this Ker nel . set Ar g( 0, cl VBO) ; Execute kernels cl Ker nel . set Ar g( 1, mesh_wi dt h, WebCL. t ype. UI NT) ; cl Ker nel . set Ar g( 2, mesh_hei ght , WebCL. t ype. UI NT) ; Update Scene Render scene 35
  • 36. Click to edit Master title style  This subtitle is 20 points  Bullets are blue  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this 36
  • 37. WebCL/WebGL interop style Click to edit Master title(host side)  This subtitle is 20 points Initialize WebGL / / Sy nc GL and ac qui r e buf f er f r om GL gl . f l us h( ) ;  Bullets are blue c l Queue. enqueueAc qui r eGLObj ec t s ( c l Tex t ur e) ; Initialize WebCL  They have 110% line spacing, 2 points before & after / / Set gl obal and l oc al wor k s i z es f or k er nel v ar l oc al = nul l ; Configure shared CL-GL data v ar gl obal = [ Tex t ur eW dt h, Tex t ur eHei ght ] ; i  Longer bullets in the form of a paragraph are harder to read if Set kernels args try { c l Queue. enqueueNDRangeKer nel ( c l Ker nel , nul l , gl obal , l This is the maximum there is insufficient line spacing. oc al ) ; } c at c h ( er r ) { t hr ow " Fai l ed t o enqueue k er nel ! " + er r ;of lines per slide (seven). recommended number Enqueue commands }  Sub bullets look like this / / Rel eas e GL t ex t ur e Execute kernels c l Queue. enqueueRel eas eGLObj ec t s ( c l Tex t ur e) ; c l Queue. f l us h( ) ; Update Scene Render scene 37
  • 38. Click to edit Master title style  This subtitle is 20 points  Bullets are blue  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this 38
  • 39. Perspectives Click to edit Master title style  This subtitle is 20 points applications in Web browsers WebCL enables GPGPU  Bullets are usage of architecture can lead to impressive  Careful blue  They have 110% line spacing, 2 points before & after speedup  Longer bullets ininteroperability, rich graphicsharder to read if  With WebGL the form of a paragraph are Web there is insufficient now spacing. This is the maximum applications are line possible  recommended number of lines per slide (seven). DRAFT WebCL specification  Sub bullets look like this  Quite stable JavaScript API  Focusing on more security and robustness 39
  • 40. WebCL edit Master title style Click to Open process and Resources  Khronos open process points Web community  This subtitle is 20 to engage  Public specification  Bullets are blue drafts, mailing lists, forums  http://www.khronos.org/webcl/  They have 110% line spacing, 2 points before & after  webcl_public@khronos.org  Longer bullets in the form of a paragraph are harder to read if  Nokia open source prototype for Firefox in May 2011 (LGPL) there is insufficient line spacing. This is the maximum  http://webcl.nokiaresearch.com  recommended number of lines per in July (seven). Samsung open source prototype for WebKit slide 2011 (BSD)  Sub bullets look like this http://code.google.com/p/webcl/  Motorola open source prototype for NodeJS in March 2012 (BSD)  https://github.com/Motorola-Mobility/node-webcl 40
  • 41. Click to edit Master title style  This subtitle is 20 points  Bullets are blue  They have 110% line spacing, 2 points before & after  Thaank Longer bullets in the form of paragraph are harder to read if you! there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this 41
  • 42. Click to edit Master title style This slide has a 16:9 media window  This subtitle is 20 points  Bullets are blue  They have 110% line spacing, 2 points before & after  Longer bullets in the form of a paragraph are harder to read if there is insufficient line spacing. This is the maximum recommended number of lines per slide (seven).  Sub bullets look like this 42
  • 43. Start to edit Master Click learning Now! title style   OpenCL Programming Guide - The “Red Book” of OpenCL This subtitle is 20 points  http://www.amazon.com/OpenCL-Programming-Guide-Aaftab-Munshi/dp/0321749642   OpenCL in Action blue Bullets are  http://www.amazon.com/OpenCL-Action-Accelerate-Graphics-Computations/dp/1617290173/   They have 110% line spacing, 2 points before & after Heterogeneous Computing with OpenCL  http://www.amazon.com/Heterogeneous-Computing-with-OpenCL-ebook/dp/B005JRHYUS   LongerProgramming Bookthe form of a paragraph are harder to read if The OpenCL bullets in there is insufficient line spacing. This is the maximum  http://www.fixstars.com/en/opencl/book/ recommended number of lines per slide (seven).  Sub bullets look like this 43

Notes de l'éditeur

  1. This demonstration is not working on a browser but uses OpenCL to speedup physics computations for the position of all the planks.Our goal with WebCL is to be able one day to perform such computations on your web browser.
  2. While CPU tend to have 2 to 32 cores, GPU have much more.
  3. Historically, when GPU became programmable, people try to use vertex and fragment shader programs to perform more general computations than rendering vector graphics.
  4. The scatter &amp; gather operations are fundamental operations for GPGPU. Typically, scatter is difficult because in a graphics pipeline the fragment shader is called for writing one output value. One can still perform scatter using vertex shaders cleverly. Newer versions of graphics API &amp; drivers provide specific methods for scatter.Gather is no brainer since it can be achieved by reading textures.
  5. To understand work-groups and work-items, suppose you have a matrix or an image to process. The image can be decomposed into tiles and each tile can be processed independently.A tile would be a work-group. Inside this work-group, each pixel would be processed by a work-item.Unlike typical CPU multithreading, all work-items (or threads) execute synchronously, thanks to the SIMD nature of GPUs. This has an important consequence: if each thread execute the same number of operations them they will complete at the same time. But if one is taking longer than other threads, e.g. due to branch divergence (like an if…else clause), then other threads will wait until it finishes its operations, thereby slowing down effective computational throughput.
  6. Developers must manage memory explicitly. For best performance increase, move data closer to the cores. However, be aware that the closer you get to cores, the smaller the memory available.
  7. For 1-Dimensional problems, in a sequential language like JavaScript, one would use a for loop to iterate across the 1D array. With CL, we tell the device to iterate over a 1-D domain and only provide the core of the loop. When CL calls the kernel, it provides methods to query which index (i.e. thread) is executing the kernel.By extension, for 2D problems, in JavaScript, we would have 2 imbricated for loops. CL’s work-items are going to iterate over the 2D domain and (x,y) index of the thread calling a kernel is provided by get_global_id(dimension), with dimension = 0 (1st dimension), or 1 (2nd dimension).
  8. WebCL object cannot be constructed with new operator. It is like the Math object of JavaScript.
  9. ----- Meeting Notes (8/2/12 16:54) -----kernels can come from anywhere
  10. While we explain how to setup a simple vector multiplication kernel, this would apply to matrices too. Matrix multiplication is probably what I would call the best “Hello World” example for compute languages.
  11. To optimize computations, recall the work-group/work-item analogy we explained earlier with an image. We said that work-groups are tiles onto which work-items operate.Here we do exactly that with P work-items (or threads) per work-group. Use WebCLKernel.getWorkGroupInfo(WebCL.KERNEL_PREFERRED_WORKGROUP_SIZE) to find out what this number is for a device. It is typically a power of 2 like 16, 32, 64.
  12. However, that CPU being hyperthreaded, it is seen as 8 cores rather than 4.Onthis machine, the preferred workgroup size multiple is 1 for CPU, 64 for GPU. The maximum workgroup size is 128 for CPU, 256 for GPU. So we set the local workgroup size to 128x1 (=128) for CPU and 16x16 (=256 and divisible by 64) for GPU.As you can see, the performance of CL on CPU is pretty good and even better than GPU for small matrix sizes, less than 512x512. As the matrix size grow, the CPU performance remains constant but the GPU performance grows exponentially; as expected. Note: the OpenMP code uses the same tiling optimization as for GPU with 8 threads.If you recall the video at the beginning of this course, the physics engine is essentially doing matrix/vector multiplication for 32k items. With these results, a tremendous speedup can be achieved compared to a CPU approach.
  13. ----- Meeting Notes (8/2/12 16:54) -----Vertex buffer objects
  14. This example comes from Nvidia CUDA/OpenCL SDK. A sphere is rendered by GL but the vertices’ positions are modified by CL with some noise to create this cool effect.
  15. This is the recommended way to synchronize GL and CL queue. However, there is a more optimal way using GL and CL events rather than flushing their queues. However, synchronization with events is an advanced subject we don’t have time to discuss in this course and you can found presentations online.
  16. The Khronos web site has a wiki with links to all these WebCL implementation prototypes. On this web site, you will also find links to this presentation, course notes, and updates.All examples in this course were done with node-webcl from Motorola and rendered with node-webgl, both are freely available on github. While this is not an implementation within a web browser, it uses the same JavaScript engine as Chrome/Chromium browsers i.e. Google v8 engine. We use nodejs for server-side processing and the same code is being ported to Chrome browser. Using nodejs we can prototype new features quickly before adding them to browsers.