SlideShare une entreprise Scribd logo
1  sur  92
Télécharger pour lire hors ligne
YaCF: The
accULL Compiler

Juan J. Fumero

Introduction

YaCF

Experiments

Conclusions

Future Work
                  YaCF: The accULL Compiler
                     Undergraduate Thesis Project


                     Juan Jos´ Fumero Alfonso
                              e
                      Universidad de La Laguna



                         22 de junio de 2012




                                                    1 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Outline
Introduction

YaCF

Experiments

Conclusions
                  1 Introduction
Future Work




                  2 YaCF


                  3 Experiments


                  4 Conclusions


                  5 Future Work




                                             2 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Outline
Introduction

YaCF

Experiments

Conclusions
                  1 Introduction
Future Work




                  2 YaCF


                  3 Experiments


                  4 Conclusions


                  5 Future Work




                                             3 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                         Moore’s Law
Introduction

YaCF

Experiments

Conclusions

Future Work




                  Every 18 months the number of transistors could be doubled.



                                                                                4 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Nowadays Parallel Architectures
Introduction

YaCF

Experiments

Conclusions

Future Work




                                                    5 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                             Parallel Architectures
Introduction

YaCF

Experiments

Conclusions

Future Work




                  The solution
                    • More processors
                    • More cores per processor




                                                                      6 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                               Parallel Architectures
Introduction

YaCF

Experiments

Conclusions

Future Work
                  The systems are hybrid using all options.




                                                                        7 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Parallel Architectures
Introduction

YaCF

Experiments

Conclusions

Future Work




                                           8 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                       OpenMP: Shared Memory
Introduction

YaCF
                                                                 Programming
Experiments           • API that support SMP programming.
Conclusions
                      • Multi-platform.
Future Work
                      • A directive-based approach.
                      • A set of compiler directives, library routines and environment
                         variables for parallel programming.

                  OpenMP example
                   1 #pragma omp p a r a l l e l
                   2 {
                   3     #pragma omp master
                   4     {
                   5            nthreads = o m p _ g e t _ n u m _ t h r e a d s ( ) ;
                   6     }
                   7     #pragma omp f o r p r i v a t e ( x ) reduction (+: sum ) schedule ( runtime )
                   8      f o r ( i =0; i < NUM_STEPS ; ++i ) {
                   9            x = ( i +0.5)∗step ;
                  10            sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ;
                  11     }
                  12     #pragma omp master
                  13     {
                  14            pi = step ∗ sum ;
                  15     }
                  16 }



                                                                                                          9 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                      MPI: Message Passing Interface
Introduction

YaCF

Experiments

Conclusions

Future Work         • A language-independent communications protocol used to
                      program parallel applications.
                    • MPI’s goals are high performance, scalability and portability.

                  MPI example
                  1 MPI_Comm_size ( MPI_COMM_WORLD , &M P I _ N U M P R O C E S S O R S ) ;
                  2 MPI_Comm_rank ( MPI_COMM_WORLD , &MPI_NAME ) ;
                  3 w = 1.0 / N ;
                  4 f o r ( i = MPI_NAME ; i < N ; i += M P I _ N U M P R O C E S S O R S ) {
                  5       local = ( i + 0 . 5 ) ∗ w ;
                  6       pi_mpi = pi_mpi + 4 . 0 / ( 1 . 0 + local ∗ local ) ;
                  7 }
                  8 MPI_Allreduce (&pi_mpi , &gpi_mpi , 1 , MPI_DOUBLE , MPI_SUM , MPI_C OMM_WOR LD ) ;




                                                                                                          10 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                 High Performance Computing
Introduction

YaCF

Experiments       • The most powerful computers at the moment.
Conclusions
                  • Systems with a massive number of processors.
Future Work
                  • High speed of calculation.
                  • It contains thousands of processors and cores.
                  • Systems very expensive and consuming a huge amount of energy.




                                                                               11 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                    TOP 500: High Performance
Introduction

YaCF
                                                   Computing
Experiments

Conclusions
                  • The TOP500 project ranks and details the 500 (non-distributed)
Future Work
                    most powerful known computer systems in the world.
                  • The project publishes an updated list of the supercomputers
                    twice a year.




                                                                                  12 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Accelerators Era
Introduction

YaCF

Experiments

Conclusions

Future Work




                                     13 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous
Introduction

YaCF
                                                  Programming
Experiments

Conclusions
                  CUDA
Future Work       Developed by NVIDIA.
                    • Pros: its performance, it is easier than OpenCL.
                    • Con: only works with NVIDIA hardware.




                                                                         14 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                 Languages for Heterogeneous
Introduction

YaCF
                                                                Programming
Experiments

Conclusions

Future Work



                  CUDA

                  1 __global__ v o i d mmkernel ( f l o a t ∗ a , f l o a t ∗ b , f l o a t ∗ c , i n t n ,
                  2   int m , int p)
                  3 {
                  4     i n t i = blockIdx . x∗32 + threadIdx . x ;
                  5     i n t j = blockIdx . y ;
                  6     f l o a t sum = 0 . 0 f ;
                  7     f o r ( i n t k = 0 ; k < p ; ++k ) sum += b [ i+n∗k ] ∗ c [ k+p∗j ] ;
                  8     a [ i+n∗j ] = sum ;
                  9 }




                                                                                                              15 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous
Introduction

YaCF
                                                  Programming
Experiments

Conclusions

Future Work
                  OpenCL
                  A framework developed by the Khronos Group.
                    • Pros: can be used with any device, it is a standard.
                    • Cons: more complex than CUDA, immature.




                                                                             16 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                Languages for Heterogeneous
Introduction

YaCF
                                                               Programming
Experiments

Conclusions

Future Work
                  OpenCL

                   1 __kernel v o i d matvecmul ( __global f l o a t ∗a ,
                   2       c o n s t __global f l o a t ∗b , c o n s t __global f l o a t ∗c ,
                   3       c o n s t uint N ) {
                   4           float R;
                   5           int k;
                   6           i n t xid = get_global_id ( 0 ) ;
                   7           i n t yid = get_global_id ( 1 ) ;
                   8           i f ( xid < N )        {
                   9                 i f ( yid < N ) {
                  10                       R = 0.0;
                  11                       f o r ( k = 0 ; k < N ; k++)
                  12                                    R += b [ xid ∗ N + k ] ∗ c [ k∗N + yid ] ;
                  13                       a [ xid∗N+yid ] = R ;
                  14                 }
                  15          }
                  16 }




                                                                                                     17 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous
Introduction

YaCF
                                                  Programming
Experiments

Conclusions       Pros
Future Work
                   1   The programmer can use all machine’s devices.
                   2   GPU and CPU could work in parallel.




                                                                       18 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous
Introduction

YaCF
                                                  Programming
Experiments

Conclusions       Problems
Future Work
                   1   The programmer needs to know low-level details of the
                       architecture.




                                                                               19 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous
Introduction

YaCF
                                                  Programming
Experiments

Conclusions

Future Work
                  Cons
                   1   The programmer needs to know low-level details of the
                       architecture.
                   2   Source codes need to be rewritten:
                         • One version for OpenMP/MPI.
                         • A different version for GPU.
                   3   Good performance requires a great effort in parameter tuning.
                   4   These languages (CUDA/OpenCL) are complex and new for
                       non-experts.




                                                                                      20 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                        GPGPU (General Purpose GPU)
Introduction

YaCF
                                          Computing
Experiments

Conclusions

Future Work




                  Can we use GPUs for parallel
                  computing? Is this efficient?




                                                      21 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  The NBody Problem
Introduction

YaCF

Experiments

Conclusions

Future Work

                       • Simulation numerically
                         approximates the
                         evolution of a system of
                         bodies.
                       • Each body continuously
                         interacts with other
                         bodies.
                       • Fluid flow simulations.




                                                    22 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                NBody description
Introduction

YaCF

Experiments

Conclusions

Future Work


                  Acceleration
                                                     Fi
                                              ai =
                                                     mi
                                                           mj rij
                                 ai ≈ G ·
                                                    (||rij ||2 +    2 )3/2
                                            1≤j≤N




                                                                             23 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                            CUDA implementation
Introduction

YaCF

Experiments

Conclusions

Future Work




                  • The method is Particle to Particle.
                  • Its computational complexity is O(n2 )
                  • Evaluate all pair-wise interactions. It is exact.




                                                                        24 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  CUDA implementation: blocks and
Introduction

YaCF
                                             grids
Experiments

Conclusions

Future Work




                                                     25 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                              CUDA Kernel: Tile calculation
Introduction

YaCF

Experiments

Conclusions

Future Work


                   1 __device__ float3 gravitation ( float4 myPos , float3 accel ) {
                   2     e x t e r n __shared__ float4 sharedPos [ ] ;
                   3     unsigned long i = 0;
                   4
                   5     f o r ( u n s i g n e d i n t counter = 0 ; counter < blockDim . x ; counter++ )
                   6     {
                   7             accel = b o d y B o d y I n t e r a c t i o n ( accel , SX ( i++) , myPos ) ;
                   8     }
                   9     r e t u r n accel ;
                  10 }




                                                                                                                 26 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                   CUDA Kernel: calculate forces
Introduction

YaCF

Experiments

Conclusions

Future Work
                   1 __global__ v o i d c al c u l a t e _ f o r c es ( float4∗ globalX , float4∗ globalA )
                   2 {
                   3   // A s h a r e d memory b u f f e r t o s t o r e t h e body p o s i t i o n s .
                   4   e x t e r n __shared__ float4 [ ] shPosition ;
                   5   float4 myPosition ;
                   6   i n t i , tile ;
                   7   float3 a c c = {0.0 f , 0 . 0 f , 0 . 0 f };
                   8   // G l o b a l t h r e a d ID ( r e p r e s e n t t h e u n i q u e body i n d e x i n t h e s i m u l a t i o n )
                   9   i n t gtid = blockIdx . x ∗ blockDim . x + threadIdx . x ;
                  10   // T h i s i s t h e p o s i t i o n o f t h e body we a r e c o m p u t i n g t h e a c c e l e r a t i o n f o r .
                  11   float4 myPosition = globalX [ gtid ] ;
                  12   f o r ( i = 0 , tile = 0 ; i < N ; i += blockDim . x , tile++)
                  13   {
                  14       i n t idx = tile ∗ blockDim . x + threadIdx . x ;
                  15       shPosition [ threadIdx . x ] = globalX [ idx ] ;
                  16       __syncthreads ( ) ;
                  17       a c c = t il e_ ca lc u l a t i on ( myPosition , a c c ) ;
                  18       __syncthreads ( ) ;
                  19   }
                  20   // r e t u r n
                  21 }




                                                                                                                                              27 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                                  Results
Introduction
                  •   Tesla C1060 (1.3).
YaCF
                  •   Sequential source code: Intel Corei7 930.
Experiments

Conclusions
                  •   NBody SDK.
Future Work       •   Cuda Runtime /Cuda Driver: 4.0.
                        • 400000 bodies
                        • 200 interactions.

                         Device      Cores    Memory     Performance (GFLOPS)
                      Tesla C1060     240      4GB      933 (Single), 78 (double)
                      Intel Corei7     4       4GB        44.8 (11.2 per core)




                                                                                    28 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                             Results
Introduction

YaCF

Experiments

Conclusions
                  • Sequential code: ≈ 147202512.40 ms ≈ 41 hours (40.89 hours)
Future Work
                  • Parallel CUDA code: 1392029.6 ms = (23.3 minutes)
                  • The speedup is 105.7 (105×).




                                                                              29 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                            At the Present Time
Introduction

YaCF

Experiments

Conclusions

Future Work




                  • Some applications accelerate with GPUs.
                  • The user need to learn new programming languages and tools.
                  • The CUDA model and its architecture have to be understood.
                  • Non-expert users have to write programs for a new model.




                                                                                  30 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                  GPGPU Languages
Introduction

YaCF

Experiments

Conclusions

Future Work       OpenACC: introduced last November in
                  SuperComputing’2011
                  A directive based language.
                    • Aimed to be standard.
                    • Supported by: Cray, NVIDIA, PGI and CAPS.
                    • One simple source code for all versions.
                    • Platform independent.
                    • Easier for beginners.




                                                                    31 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                GPGPU Languages
Introduction

YaCF

Experiments
                  OpenACC
Conclusions       A directive based language.
Future Work




                                                                  32 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  A New Dimension for HPC
Introduction

YaCF

Experiments

Conclusions

Future Work




                                            33 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                            accULL: our OpenACC
Introduction

YaCF
                                  Implementation
Experiments

Conclusions

Future Work
                  accULL = compiler + runtime library.




                                                         34 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                            accULL: our OpenACC
Introduction

YaCF
                                  Implementation
Experiments

Conclusions

Future Work
                  accULL = compiler + runtime library.
                     accULL = YaCF + Frangollo.




                                                         34 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                             Initial Objectives of this Project
Introduction

YaCF

Experiments

Conclusions

Future Work


                  • To integrate C99 in the YaCF project.
                  • To implement a new class hierarchy for new YaCF Frontends.
                  • To implement an OpenACC Frontend.
                  • To complete the OpenMP grammar with directives in OpenMP
                    3.0.
                  • To test the new C99 interface.




                                                                                 35 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                      Source-to-source Compilers
Introduction

YaCF

Experiments

Conclusions

Future Work




                  • Rose Compiler Framework.
                  • Cetus Compiler.
                  • Mercurium.




                                                                   36 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Outline
Introduction

YaCF

Experiments

Conclusions
                  1 Introduction
Future Work




                  2 YaCF


                  3 Experiments


                  4 Conclusions


                  5 Future Work




                                             37 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC
Introduction

YaCF
                        implementation
Experiments

Conclusions

Future Work




                                         38 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC
Introduction

YaCF
                        implementation
Experiments

Conclusions

Future Work




                                         39 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC
Introduction

YaCF
                        implementation
Experiments

Conclusions

Future Work




                                         40 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC
Introduction

YaCF
                        implementation
Experiments

Conclusions

Future Work




                                         41 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Yet Another Compiler
Introduction

YaCF
                                  Framework
Experiments

Conclusions

Future Work




                                               42 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                                  YaCF
Introduction

YaCF

Experiments

Conclusions

Future Work       • A source-to-source compiler that translates C code with
                    OpenMP, llc and OpenACC annotations into code with
                    Frangollo calls.
                  • Integrates code analysis tools.
                  • Completely written in Python.
                  • Based on widely known object oriented software patterns.
                  • Based on the pycparser Python module.
                  • Implementing code transformation is only a matter of writing a
                    few lines of code.




                                                                                     43 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       44 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       45 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       46 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       47 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       48 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       49 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       50 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       51 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       52 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       53 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       54 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       55 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       56 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture
Introduction

YaCF

Experiments

Conclusions

Future Work




                                       57 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                   YaCF: Statistics
Introduction

YaCF

Experiments

Conclusions

Future Work




                  • 20683 lines of Python code.
                  • 2158 functions and methods.
                  • My contribution has been about 25 % of YaCF project.




                                                                           58 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Outline
Introduction

YaCF

Experiments

Conclusions
                  1 Introduction
Future Work




                  2 YaCF


                  3 Experiments


                  4 Conclusions


                  5 Future Work




                                             59 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                     Experiments
Introduction

YaCF

Experiments

Conclusions

Future Work
                  • Benchmark Scalapack: testing
                    C99.
                  • Block Matrix Multiplication in
                    accULL.
                  • Three different problems from
                    the Rodinia Benchmark:
                      • HotSpot.
                      • SRAD.
                      • Needleman–Wunsch.




                                                                   60 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                        ScaLAPACK
Introduction

YaCF

Experiments

Conclusions

Future Work


                  • The ScaLAPACK (Scalable LAPACK) is a library that includes
                    a subset of LAPACK routines redesigned for distributed memory
                    MIMD parallel computers.
                  • ScaLAPACK is designed for heterogeneous computing.
                  • It is portable to any computer that support MPI.
                  • Scalable depends on PBLAS operations.




                                                                                61 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                           ScaLAPACK: results in YaCF
Introduction

YaCF

Experiments

Conclusions
                  Directory          Total C files   Success   Failures
Future Work
                  PBLAS/SRC              123          123        0
                  REDIST/SRC              21          21         0
                  PBLAS/SRC/PTOOLS       102          101        1
                  PBLAS/TESTING           2            1         1
                  PBLAS/TIMING            2            1         1
                  REDIST/TESTING          10           0        10
                  SRC                     9            9         0
                  TOOLS                   2            2         0
                  Total                  271          258       13




                                                                         62 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                               ScaLAPACK: results in YaCF
Introduction

YaCF

Experiments

Conclusions
                   Directory             Total C files Success Failures
Future Work
                   PBLAS/SRC                  123          123          0
                   REDIST/SRC                  21           21          0
                   PBLAS/SRC/PTOOLS           102          101          1
                   PBLAS/TESTING               2             1          1
                   PBLAS/TIMING                2             1          1
                   REDIST/TESTING              10            0         10
                   SRC                         9             9          0
                   TOOLS                       2             2          0
                   Total                      271          258         13
                  95 % of the ScaLAPACK C files are correctly parsed in YaCF.




                                                                               62 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                           Platforms
Introduction

YaCF

Experiments

Conclusions       • Garoe: A desktop computer with an Intel Core i7 930 processor
Future Work         (2.80 GHz), with 1MB of L2 cache, 8MB of L3 cache, shared by
                    the four cores. The system has 4 GB RAM and a Tesla C2050
                    with 4 GB of memory attached.




                                                                                63 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                            Platforms
Introduction

YaCF

Experiments

Conclusions
                  • Drago: A second cluster node. It is a shared memory system
Future Work         with 4 Intel Xeon E7. Each processor has 10 cores. In this case,
                    the accelerator platform is Intel OpenCL SDK 1.5 which runs on
                    the CPU.




                                                                                  64 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                     MxM in accULL
Introduction

YaCF

Experiments

Conclusions

Future Work


                  • MxM is a basic kernel frequently used to showcase the peak
                    performance of GPU computing.
                  • We compare the performance of the accULL implementation
                    with that of:
                      • OpenMP.
                      • CUDA.
                      • OpenCL.




                                                                                 65 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                                                                MxM in accULL
Introduction

YaCF

Experiments

Conclusions
                  MxM OpenACC code
Future Work

                   1   #pragma a c c k e r n e l s name ( " mxm " ) c o p y ( a [ L∗N ] ) c o p y i n ( b [ L∗M] , c [M∗N ] )
                   2   {
                   3   #pragma a c c l o o p p r i v a t e ( i , j ) c o l l a p s e ( 2 )
                   4   f o r ( i = 0 ; i < L ; i++)
                   5       f o r ( j = 0 ; j < N ; j++)
                   6           a[i ∗ L + j] = 0.0;
                   7   /∗ I t e r a t e o v e r b l o c k s ∗/
                   8   f o r ( ii = 0 ; ii < L ; ii += tile_size )
                   9     f o r ( jj = 0 ; jj < N ; jj += tile_size )
                  10       f o r ( kk = 0 ; kk < M ; kk += tile_size ) {
                  11         /∗ I t e r a t e i n s i d e a b l o c k ∗/
                  12        #pragma a c c l o o p collapse ( 2 ) p r i v a t e ( i , j , k )
                  13         f o r ( j=jj ; j < min ( N , jj+tile_size ) ; j++)
                  14           f o r ( i=ii ; i < min ( L , ii+tile_size ) ; i++)
                  15             f o r ( k=kk ; k < min ( M , kk+tile_size ) ; k++)
                  16               a [ i∗L+j ] += ( b [ i∗L+k ] ∗ c [ k∗M+j ] ) ;
                  17         }
                  18   }




                                                                                                                                66 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  MxM in accULL (Garoe)
Introduction

YaCF

Experiments

Conclusions

Future Work




                                          67 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  MxM in accULL (Drago)
Introduction

YaCF

Experiments

Conclusions

Future Work




                                          68 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  SRAD: an Image Filtering Code
Introduction

YaCF

Experiments

Conclusions

Future Work




                                                  69 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                     SRAD (Garoe)
Introduction

YaCF

Experiments

Conclusions

Future Work




                  CUDA in Frangollo performs better than CUDA native.

                                                                        70 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  SRAD (Drago)
Introduction

YaCF

Experiments

Conclusions

Future Work




                                 71 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  NW: Needleman-Wunsch, a
Introduction

YaCF
                   Sequence Alignment Code
Experiments

Conclusions

Future Work




                                             72 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                          NW (Garoe)
Introduction

YaCF

Experiments

Conclusions

Future Work




                  Poor results (but better than OpenMP - 4 cores)

                                                                       73 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  NW (Drago)
Introduction

YaCF

Experiments

Conclusions

Future Work




                               74 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  HotSpot: a Thermal Simulation
Introduction

YaCF
                   Tool for Estimating Processor
Experiments                         Temperature
Conclusions

Future Work




                                                   75 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                HotSpot (Garoe)
Introduction

YaCF

Experiments

Conclusions

Future Work




                  As good as native versions.

                                                                  76 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  HotSpot (Drago)
Introduction

YaCF

Experiments

Conclusions

Future Work




                                    77 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Outline
Introduction

YaCF

Experiments

Conclusions
                  1 Introduction
Future Work




                  2 YaCF


                  3 Experiments


                  4 Conclusions


                  5 Future Work




                                             78 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                             Conclusions: Compiler
Introduction

YaCF
                                                      Technologies
Experiments

Conclusions

Future Work




                  • Compiler technologies tend to use and optimize source-to-source
                    compilers to generate and transform source code.
                  • It is easier to parallelize a source code with AST transformations.
                  • AST transformations enable to programmers to easily generate
                    code for any platform.




                                                                                     79 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                           Conclusions: Programming Model
Introduction

YaCF

Experiments

Conclusions

Future Work       • The usage of directive-based programming languages allow
                    non-expert programmers to abstract from architectural details
                    and write programs easier.
                  • The OpenACC standard is a start point to heterogeneous
                    systems programming.
                  • Future versions of the OpenMP standard will include support for
                    accelerators.
                  • The results we are obtaining with accULL our early OpenACC
                    implementation are promising.




                                                                                    80 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                     References I
Introduction

YaCF

Experiments       Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
                        a           a o
Conclusions       accULL: An OpenACC implementation with CUDA and OpenCL
Future Work
                  support
                  International European Conference on Parallel and Distributed
                  Computing 2012.
                  Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
                        a          a o
                  Directive-based Programming for GPUs: A Comparative Study
                  The 14th IEEE International Conference on High Performance
                  Computing and Communications.
                  Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
                        a          a o
                  accULL: an user-directed Approach to Heterogeneous
                  Programming
                  The 10th IEEE International Symposium on Parallel and
                  Distributed Processing with Applications.


                                                                               81 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Outline
Introduction

YaCF

Experiments

Conclusions
                  1 Introduction
Future Work




                  2 YaCF


                  3 Experiments


                  4 Conclusions


                  5 Future Work




                                             82 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                   Future Work
Introduction

YaCF

Experiments

Conclusions

Future Work
                  • Add support to MPI with CUDA and OpenCL.




                                                                 83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                    Future Work
Introduction

YaCF

Experiments

Conclusions

Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.




                                                                  83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                    Future Work
Introduction

YaCF

Experiments

Conclusions

Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and
                    CAPS-HMPP.




                                                                          83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                        Future Work
Introduction

YaCF

Experiments

Conclusions

Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and
                    CAPS-HMPP.
                  • Adding support for vectorization.




                                                                          83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                        Future Work
Introduction

YaCF

Experiments

Conclusions

Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and
                    CAPS-HMPP.
                  • Adding support for vectorization.
                  • Exploring FPGAs to combine with CUDA and OpenCL.
                  • To introduce LLVM Compiler Framework in the Frontend.




                                                                            83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                        Future Work
Introduction

YaCF

Experiments

Conclusions

Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and
                    CAPS-HMPP.
                  • Adding support for vectorization.
                  • Exploring FPGAs to combine with CUDA and OpenCL.
                  • To introduce LLVM Compiler Framework in the Frontend.




                                                                            83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Thank you for your attention
Introduction

YaCF

Experiments

Conclusions

Future Work




                    Juan Jos´ Fumero Alfonso
                            e
                       jfumeroa@ull.edu.es




                                                 84 / 85
YaCF: The
accULL Compiler

Juan J. Fumero

Introduction

YaCF

Experiments

Conclusions

Future Work
                  YaCF: The accULL Compiler
                     Undergraduate Thesis Project


                     Juan Jos´ Fumero Alfonso
                              e
                      Universidad de La Laguna



                         22 de junio de 2012




                                                    85 / 85

Contenu connexe

Tendances

JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020Joseph Kuo
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir OverviewPTIHPA
 
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingJonathan Salwan
 
Embedded system design psoc lab report
Embedded system design psoc lab reportEmbedded system design psoc lab report
Embedded system design psoc lab reportRamesh Naik Bhukya
 
All VLSI programs
All VLSI programsAll VLSI programs
All VLSI programsGouthaman V
 
Arduino C maXbox web of things slide show
Arduino C maXbox web of things slide showArduino C maXbox web of things slide show
Arduino C maXbox web of things slide showMax Kleiner
 
Dive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingDive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingSaumil Shah
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
 
Digital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECEDigital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECERamesh Naik Bhukya
 
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW ImplementationZhen Wei
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVMWang Hsiangkai
 
GC in C++0x [eng]
GC in C++0x [eng]GC in C++0x [eng]
GC in C++0x [eng]yak1ex
 

Tendances (20)

Advance ROP Attacks
Advance ROP AttacksAdvance ROP Attacks
Advance ROP Attacks
 
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented Programming
 
Vlsi lab2
Vlsi lab2Vlsi lab2
Vlsi lab2
 
Embedded system design psoc lab report
Embedded system design psoc lab reportEmbedded system design psoc lab report
Embedded system design psoc lab report
 
All VLSI programs
All VLSI programsAll VLSI programs
All VLSI programs
 
Arduino C maXbox web of things slide show
Arduino C maXbox web of things slide showArduino C maXbox web of things slide show
Arduino C maXbox web of things slide show
 
Dive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingDive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented Programming
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
 
Digital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECEDigital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECE
 
8 Bit ALU
8 Bit ALU8 Bit ALU
8 Bit ALU
 
TVM VTA (TSIM)
TVM VTA (TSIM) TVM VTA (TSIM)
TVM VTA (TSIM)
 
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
 
GC in C++0x [eng]
GC in C++0x [eng]GC in C++0x [eng]
GC in C++0x [eng]
 
Idiomatic C++
Idiomatic C++Idiomatic C++
Idiomatic C++
 
Functional programming
Functional programmingFunctional programming
Functional programming
 

Similaire à YaCF: The accULL Compiler Thesis Analyzes Parallelization

accULL (HAC Leganés)
accULL (HAC Leganés)accULL (HAC Leganés)
accULL (HAC Leganés)Ruymán Reyes
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
Neural_Programmer_Interpreter
Neural_Programmer_InterpreterNeural_Programmer_Interpreter
Neural_Programmer_InterpreterKaty Lee
 
Education using FIRE
Education using FIREEducation using FIRE
Education using FIREFORGE project
 
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Takahiro Katagiri
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
 
An integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processorsAn integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processorsVLSICS Design
 
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...Felipe Prado
 
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...Maksim Shudrak
 
Bifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseBifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseJorge Ressia
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization Ganesan Narayanasamy
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Igalia
 
A Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdfA Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdfAlexelectronic1
 
Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Osama Ghandour Geris
 
Adsa lab manual
Adsa lab manualAdsa lab manual
Adsa lab manualRaja Ch
 
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
 Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be... Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...Arturo Hoffstadt
 

Similaire à YaCF: The accULL Compiler Thesis Analyzes Parallelization (20)

accULL (HAC Leganés)
accULL (HAC Leganés)accULL (HAC Leganés)
accULL (HAC Leganés)
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Neural_Programmer_Interpreter
Neural_Programmer_InterpreterNeural_Programmer_Interpreter
Neural_Programmer_Interpreter
 
Education using FIRE
Education using FIREEducation using FIRE
Education using FIRE
 
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
An integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processorsAn integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processors
 
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
 
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
 
Bifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseBifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk Loose
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
 
A Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdfA Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdf
 
Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10
 
java
javajava
java
 
Adsa lab manual
Adsa lab manualAdsa lab manual
Adsa lab manual
 
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
 Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be... Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
 

Dernier

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 

Dernier (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 

YaCF: The accULL Compiler Thesis Analyzes Parallelization

  • 1. YaCF: The accULL Compiler Juan J. Fumero Introduction YaCF Experiments Conclusions Future Work YaCF: The accULL Compiler Undergraduate Thesis Project Juan Jos´ Fumero Alfonso e Universidad de La Laguna 22 de junio de 2012 1 / 85
  • 2. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 2 / 85
  • 3. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 3 / 85
  • 4. YaCF: The accULL Compiler Juan J. Fumero Moore’s Law Introduction YaCF Experiments Conclusions Future Work Every 18 months the number of transistors could be doubled. 4 / 85
  • 5. YaCF: The accULL Compiler Juan J. Fumero Nowadays Parallel Architectures Introduction YaCF Experiments Conclusions Future Work 5 / 85
  • 6. YaCF: The accULL Compiler Juan J. Fumero Parallel Architectures Introduction YaCF Experiments Conclusions Future Work The solution • More processors • More cores per processor 6 / 85
  • 7. YaCF: The accULL Compiler Juan J. Fumero Parallel Architectures Introduction YaCF Experiments Conclusions Future Work The systems are hybrid using all options. 7 / 85
  • 8. YaCF: The accULL Compiler Juan J. Fumero Parallel Architectures Introduction YaCF Experiments Conclusions Future Work 8 / 85
  • 9. YaCF: The accULL Compiler Juan J. Fumero OpenMP: Shared Memory Introduction YaCF Programming Experiments • API that support SMP programming. Conclusions • Multi-platform. Future Work • A directive-based approach. • A set of compiler directives, library routines and environment variables for parallel programming. OpenMP example 1 #pragma omp p a r a l l e l 2 { 3 #pragma omp master 4 { 5 nthreads = o m p _ g e t _ n u m _ t h r e a d s ( ) ; 6 } 7 #pragma omp f o r p r i v a t e ( x ) reduction (+: sum ) schedule ( runtime ) 8 f o r ( i =0; i < NUM_STEPS ; ++i ) { 9 x = ( i +0.5)∗step ; 10 sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ; 11 } 12 #pragma omp master 13 { 14 pi = step ∗ sum ; 15 } 16 } 9 / 85
  • 10. YaCF: The accULL Compiler Juan J. Fumero MPI: Message Passing Interface Introduction YaCF Experiments Conclusions Future Work • A language-independent communications protocol used to program parallel applications. • MPI’s goals are high performance, scalability and portability. MPI example 1 MPI_Comm_size ( MPI_COMM_WORLD , &M P I _ N U M P R O C E S S O R S ) ; 2 MPI_Comm_rank ( MPI_COMM_WORLD , &MPI_NAME ) ; 3 w = 1.0 / N ; 4 f o r ( i = MPI_NAME ; i < N ; i += M P I _ N U M P R O C E S S O R S ) { 5 local = ( i + 0 . 5 ) ∗ w ; 6 pi_mpi = pi_mpi + 4 . 0 / ( 1 . 0 + local ∗ local ) ; 7 } 8 MPI_Allreduce (&pi_mpi , &gpi_mpi , 1 , MPI_DOUBLE , MPI_SUM , MPI_C OMM_WOR LD ) ; 10 / 85
  • 11. YaCF: The accULL Compiler Juan J. Fumero High Performance Computing Introduction YaCF Experiments • The most powerful computers at the moment. Conclusions • Systems with a massive number of processors. Future Work • High speed of calculation. • It contains thousands of processors and cores. • Systems very expensive and consuming a huge amount of energy. 11 / 85
  • 12. YaCF: The accULL Compiler Juan J. Fumero TOP 500: High Performance Introduction YaCF Computing Experiments Conclusions • The TOP500 project ranks and details the 500 (non-distributed) Future Work most powerful known computer systems in the world. • The project publishes an updated list of the supercomputers twice a year. 12 / 85
  • 13. YaCF: The accULL Compiler Juan J. Fumero Accelerators Era Introduction YaCF Experiments Conclusions Future Work 13 / 85
  • 14. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions CUDA Future Work Developed by NVIDIA. • Pros: its performance, it is easier than OpenCL. • Con: only works with NVIDIA hardware. 14 / 85
  • 15. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work CUDA 1 __global__ v o i d mmkernel ( f l o a t ∗ a , f l o a t ∗ b , f l o a t ∗ c , i n t n , 2 int m , int p) 3 { 4 i n t i = blockIdx . x∗32 + threadIdx . x ; 5 i n t j = blockIdx . y ; 6 f l o a t sum = 0 . 0 f ; 7 f o r ( i n t k = 0 ; k < p ; ++k ) sum += b [ i+n∗k ] ∗ c [ k+p∗j ] ; 8 a [ i+n∗j ] = sum ; 9 } 15 / 85
  • 16. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work OpenCL A framework developed by the Khronos Group. • Pros: can be used with any device, it is a standard. • Cons: more complex than CUDA, immature. 16 / 85
  • 17. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work OpenCL 1 __kernel v o i d matvecmul ( __global f l o a t ∗a , 2 c o n s t __global f l o a t ∗b , c o n s t __global f l o a t ∗c , 3 c o n s t uint N ) { 4 float R; 5 int k; 6 i n t xid = get_global_id ( 0 ) ; 7 i n t yid = get_global_id ( 1 ) ; 8 i f ( xid < N ) { 9 i f ( yid < N ) { 10 R = 0.0; 11 f o r ( k = 0 ; k < N ; k++) 12 R += b [ xid ∗ N + k ] ∗ c [ k∗N + yid ] ; 13 a [ xid∗N+yid ] = R ; 14 } 15 } 16 } 17 / 85
  • 18. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Pros Future Work 1 The programmer can use all machine’s devices. 2 GPU and CPU could work in parallel. 18 / 85
  • 19. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Problems Future Work 1 The programmer needs to know low-level details of the architecture. 19 / 85
  • 20. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work Cons 1 The programmer needs to know low-level details of the architecture. 2 Source codes need to be rewritten: • One version for OpenMP/MPI. • A different version for GPU. 3 Good performance requires a great effort in parameter tuning. 4 These languages (CUDA/OpenCL) are complex and new for non-experts. 20 / 85
  • 21. YaCF: The accULL Compiler Juan J. Fumero GPGPU (General Purpose GPU) Introduction YaCF Computing Experiments Conclusions Future Work Can we use GPUs for parallel computing? Is this efficient? 21 / 85
  • 22. YaCF: The accULL Compiler Juan J. Fumero The NBody Problem Introduction YaCF Experiments Conclusions Future Work • Simulation numerically approximates the evolution of a system of bodies. • Each body continuously interacts with other bodies. • Fluid flow simulations. 22 / 85
  • 23. YaCF: The accULL Compiler Juan J. Fumero NBody description Introduction YaCF Experiments Conclusions Future Work Acceleration Fi ai = mi mj rij ai ≈ G · (||rij ||2 + 2 )3/2 1≤j≤N 23 / 85
  • 24. YaCF: The accULL Compiler Juan J. Fumero CUDA implementation Introduction YaCF Experiments Conclusions Future Work • The method is Particle to Particle. • Its computational complexity is O(n2 ) • Evaluate all pair-wise interactions. It is exact. 24 / 85
  • 25. YaCF: The accULL Compiler Juan J. Fumero CUDA implementation: blocks and Introduction YaCF grids Experiments Conclusions Future Work 25 / 85
  • 26. YaCF: The accULL Compiler Juan J. Fumero CUDA Kernel: Tile calculation Introduction YaCF Experiments Conclusions Future Work 1 __device__ float3 gravitation ( float4 myPos , float3 accel ) { 2 e x t e r n __shared__ float4 sharedPos [ ] ; 3 unsigned long i = 0; 4 5 f o r ( u n s i g n e d i n t counter = 0 ; counter < blockDim . x ; counter++ ) 6 { 7 accel = b o d y B o d y I n t e r a c t i o n ( accel , SX ( i++) , myPos ) ; 8 } 9 r e t u r n accel ; 10 } 26 / 85
  • 27. YaCF: The accULL Compiler Juan J. Fumero CUDA Kernel: calculate forces Introduction YaCF Experiments Conclusions Future Work 1 __global__ v o i d c al c u l a t e _ f o r c es ( float4∗ globalX , float4∗ globalA ) 2 { 3 // A s h a r e d memory b u f f e r t o s t o r e t h e body p o s i t i o n s . 4 e x t e r n __shared__ float4 [ ] shPosition ; 5 float4 myPosition ; 6 i n t i , tile ; 7 float3 a c c = {0.0 f , 0 . 0 f , 0 . 0 f }; 8 // G l o b a l t h r e a d ID ( r e p r e s e n t t h e u n i q u e body i n d e x i n t h e s i m u l a t i o n ) 9 i n t gtid = blockIdx . x ∗ blockDim . x + threadIdx . x ; 10 // T h i s i s t h e p o s i t i o n o f t h e body we a r e c o m p u t i n g t h e a c c e l e r a t i o n f o r . 11 float4 myPosition = globalX [ gtid ] ; 12 f o r ( i = 0 , tile = 0 ; i < N ; i += blockDim . x , tile++) 13 { 14 i n t idx = tile ∗ blockDim . x + threadIdx . x ; 15 shPosition [ threadIdx . x ] = globalX [ idx ] ; 16 __syncthreads ( ) ; 17 a c c = t il e_ ca lc u l a t i on ( myPosition , a c c ) ; 18 __syncthreads ( ) ; 19 } 20 // r e t u r n 21 } 27 / 85
  • 28. YaCF: The accULL Compiler Juan J. Fumero Results Introduction • Tesla C1060 (1.3). YaCF • Sequential source code: Intel Corei7 930. Experiments Conclusions • NBody SDK. Future Work • Cuda Runtime /Cuda Driver: 4.0. • 400000 bodies • 200 interactions. Device Cores Memory Performance (GFLOPS) Tesla C1060 240 4GB 933 (Single), 78 (double) Intel Corei7 4 4GB 44.8 (11.2 per core) 28 / 85
  • 29. YaCF: The accULL Compiler Juan J. Fumero Results Introduction YaCF Experiments Conclusions • Sequential code: ≈ 147202512.40 ms ≈ 41 hours (40.89 hours) Future Work • Parallel CUDA code: 1392029.6 ms = (23.3 minutes) • The speedup is 105.7 (105×). 29 / 85
  • 30. YaCF: The accULL Compiler Juan J. Fumero At the Present Time Introduction YaCF Experiments Conclusions Future Work • Some applications accelerate with GPUs. • The user need to learn new programming languages and tools. • The CUDA model and its architecture have to be understood. • Non-expert users have to write programs for a new model. 30 / 85
  • 31. YaCF: The accULL Compiler Juan J. Fumero GPGPU Languages Introduction YaCF Experiments Conclusions Future Work OpenACC: introduced last November in SuperComputing’2011 A directive based language. • Aimed to be standard. • Supported by: Cray, NVIDIA, PGI and CAPS. • One simple source code for all versions. • Platform independent. • Easier for beginners. 31 / 85
  • 32. YaCF: The accULL Compiler Juan J. Fumero GPGPU Languages Introduction YaCF Experiments OpenACC Conclusions A directive based language. Future Work 32 / 85
  • 33. YaCF: The accULL Compiler Juan J. Fumero A New Dimension for HPC Introduction YaCF Experiments Conclusions Future Work 33 / 85
  • 34. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF Implementation Experiments Conclusions Future Work accULL = compiler + runtime library. 34 / 85
  • 35. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF Implementation Experiments Conclusions Future Work accULL = compiler + runtime library. accULL = YaCF + Frangollo. 34 / 85
  • 36. YaCF: The accULL Compiler Juan J. Fumero Initial Objectives of this Project Introduction YaCF Experiments Conclusions Future Work • To integrate C99 in the YaCF project. • To implement a new class hierarchy for new YaCF Frontends. • To implement an OpenACC Frontend. • To complete the OpenMP grammar with directives in OpenMP 3.0. • To test the new C99 interface. 35 / 85
  • 37. YaCF: The accULL Compiler Juan J. Fumero Source-to-source Compilers Introduction YaCF Experiments Conclusions Future Work • Rose Compiler Framework. • Cetus Compiler. • Mercurium. 36 / 85
  • 38. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 37 / 85
  • 39. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 38 / 85
  • 40. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 39 / 85
  • 41. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 40 / 85
  • 42. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 41 / 85
  • 43. YaCF: The accULL Compiler Juan J. Fumero YaCF: Yet Another Compiler Introduction YaCF Framework Experiments Conclusions Future Work 42 / 85
  • 44. YaCF: The accULL Compiler Juan J. Fumero YaCF Introduction YaCF Experiments Conclusions Future Work • A source-to-source compiler that translates C code with OpenMP, llc and OpenACC annotations into code with Frangollo calls. • Integrates code analysis tools. • Completely written in Python. • Based on widely known object oriented software patterns. • Based on the pycparser Python module. • Implementing code transformation is only a matter of writing a few lines of code. 43 / 85
  • 45. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 44 / 85
  • 46. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 45 / 85
  • 47. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 46 / 85
  • 48. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 47 / 85
  • 49. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 48 / 85
  • 50. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 49 / 85
  • 51. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 50 / 85
  • 52. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 51 / 85
  • 53. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 52 / 85
  • 54. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 53 / 85
  • 55. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 54 / 85
  • 56. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 55 / 85
  • 57. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 56 / 85
  • 58. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 57 / 85
  • 59. YaCF: The accULL Compiler Juan J. Fumero YaCF: Statistics Introduction YaCF Experiments Conclusions Future Work • 20683 lines of Python code. • 2158 functions and methods. • My contribution has been about 25 % of YaCF project. 58 / 85
  • 60. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 59 / 85
  • 61. YaCF: The accULL Compiler Juan J. Fumero Experiments Introduction YaCF Experiments Conclusions Future Work • Benchmark Scalapack: testing C99. • Block Matrix Multiplication in accULL. • Three different problems from the Rodinia Benchmark: • HotSpot. • SRAD. • Needleman–Wunsch. 60 / 85
  • 62. YaCF: The accULL Compiler Juan J. Fumero ScaLAPACK Introduction YaCF Experiments Conclusions Future Work • The ScaLAPACK (Scalable LAPACK) is a library that includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. • ScaLAPACK is designed for heterogeneous computing. • It is portable to any computer that support MPI. • Scalable depends on PBLAS operations. 61 / 85
  • 63. YaCF: The accULL Compiler Juan J. Fumero ScaLAPACK: results in YaCF Introduction YaCF Experiments Conclusions Directory Total C files Success Failures Future Work PBLAS/SRC 123 123 0 REDIST/SRC 21 21 0 PBLAS/SRC/PTOOLS 102 101 1 PBLAS/TESTING 2 1 1 PBLAS/TIMING 2 1 1 REDIST/TESTING 10 0 10 SRC 9 9 0 TOOLS 2 2 0 Total 271 258 13 62 / 85
  • 64. YaCF: The accULL Compiler Juan J. Fumero ScaLAPACK: results in YaCF Introduction YaCF Experiments Conclusions Directory Total C files Success Failures Future Work PBLAS/SRC 123 123 0 REDIST/SRC 21 21 0 PBLAS/SRC/PTOOLS 102 101 1 PBLAS/TESTING 2 1 1 PBLAS/TIMING 2 1 1 REDIST/TESTING 10 0 10 SRC 9 9 0 TOOLS 2 2 0 Total 271 258 13 95 % of the ScaLAPACK C files are correctly parsed in YaCF. 62 / 85
  • 65. YaCF: The accULL Compiler Juan J. Fumero Platforms Introduction YaCF Experiments Conclusions • Garoe: A desktop computer with an Intel Core i7 930 processor Future Work (2.80 GHz), with 1MB of L2 cache, 8MB of L3 cache, shared by the four cores. The system has 4 GB RAM and a Tesla C2050 with 4 GB of memory attached. 63 / 85
  • 66. YaCF: The accULL Compiler Juan J. Fumero Platforms Introduction YaCF Experiments Conclusions • Drago: A second cluster node. It is a shared memory system Future Work with 4 Intel Xeon E7. Each processor has 10 cores. In this case, the accelerator platform is Intel OpenCL SDK 1.5 which runs on the CPU. 64 / 85
  • 67. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL Introduction YaCF Experiments Conclusions Future Work • MxM is a basic kernel frequently used to showcase the peak performance of GPU computing. • We compare the performance of the accULL implementation with that of: • OpenMP. • CUDA. • OpenCL. 65 / 85
  • 68. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL Introduction YaCF Experiments Conclusions MxM OpenACC code Future Work 1 #pragma a c c k e r n e l s name ( " mxm " ) c o p y ( a [ L∗N ] ) c o p y i n ( b [ L∗M] , c [M∗N ] ) 2 { 3 #pragma a c c l o o p p r i v a t e ( i , j ) c o l l a p s e ( 2 ) 4 f o r ( i = 0 ; i < L ; i++) 5 f o r ( j = 0 ; j < N ; j++) 6 a[i ∗ L + j] = 0.0; 7 /∗ I t e r a t e o v e r b l o c k s ∗/ 8 f o r ( ii = 0 ; ii < L ; ii += tile_size ) 9 f o r ( jj = 0 ; jj < N ; jj += tile_size ) 10 f o r ( kk = 0 ; kk < M ; kk += tile_size ) { 11 /∗ I t e r a t e i n s i d e a b l o c k ∗/ 12 #pragma a c c l o o p collapse ( 2 ) p r i v a t e ( i , j , k ) 13 f o r ( j=jj ; j < min ( N , jj+tile_size ) ; j++) 14 f o r ( i=ii ; i < min ( L , ii+tile_size ) ; i++) 15 f o r ( k=kk ; k < min ( M , kk+tile_size ) ; k++) 16 a [ i∗L+j ] += ( b [ i∗L+k ] ∗ c [ k∗M+j ] ) ; 17 } 18 } 66 / 85
  • 69. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL (Garoe) Introduction YaCF Experiments Conclusions Future Work 67 / 85
  • 70. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL (Drago) Introduction YaCF Experiments Conclusions Future Work 68 / 85
  • 71. YaCF: The accULL Compiler Juan J. Fumero SRAD: an Image Filtering Code Introduction YaCF Experiments Conclusions Future Work 69 / 85
  • 72. YaCF: The accULL Compiler Juan J. Fumero SRAD (Garoe) Introduction YaCF Experiments Conclusions Future Work CUDA in Frangollo performs better than CUDA native. 70 / 85
  • 73. YaCF: The accULL Compiler Juan J. Fumero SRAD (Drago) Introduction YaCF Experiments Conclusions Future Work 71 / 85
  • 74. YaCF: The accULL Compiler Juan J. Fumero NW: Needleman-Wunsch, a Introduction YaCF Sequence Alignment Code Experiments Conclusions Future Work 72 / 85
  • 75. YaCF: The accULL Compiler Juan J. Fumero NW (Garoe) Introduction YaCF Experiments Conclusions Future Work Poor results (but better than OpenMP - 4 cores) 73 / 85
  • 76. YaCF: The accULL Compiler Juan J. Fumero NW (Drago) Introduction YaCF Experiments Conclusions Future Work 74 / 85
  • 77. YaCF: The accULL Compiler Juan J. Fumero HotSpot: a Thermal Simulation Introduction YaCF Tool for Estimating Processor Experiments Temperature Conclusions Future Work 75 / 85
  • 78. YaCF: The accULL Compiler Juan J. Fumero HotSpot (Garoe) Introduction YaCF Experiments Conclusions Future Work As good as native versions. 76 / 85
  • 79. YaCF: The accULL Compiler Juan J. Fumero HotSpot (Drago) Introduction YaCF Experiments Conclusions Future Work 77 / 85
  • 80. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 78 / 85
  • 81. YaCF: The accULL Compiler Juan J. Fumero Conclusions: Compiler Introduction YaCF Technologies Experiments Conclusions Future Work • Compiler technologies tend to use and optimize source-to-source compilers to generate and transform source code. • It is easier to parallelize a source code with AST transformations. • AST transformations enable to programmers to easily generate code for any platform. 79 / 85
  • 82. YaCF: The accULL Compiler Juan J. Fumero Conclusions: Programming Model Introduction YaCF Experiments Conclusions Future Work • The usage of directive-based programming languages allow non-expert programmers to abstract from architectural details and write programs easier. • The OpenACC standard is a start point to heterogeneous systems programming. • Future versions of the OpenMP standard will include support for accelerators. • The results we are obtaining with accULL our early OpenACC implementation are promising. 80 / 85
  • 83. YaCF: The accULL Compiler Juan J. Fumero References I Introduction YaCF Experiments Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande a a o Conclusions accULL: An OpenACC implementation with CUDA and OpenCL Future Work support International European Conference on Parallel and Distributed Computing 2012. Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande a a o Directive-based Programming for GPUs: A Comparative Study The 14th IEEE International Conference on High Performance Computing and Communications. Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande a a o accULL: an user-directed Approach to Heterogeneous Programming The 10th IEEE International Symposium on Parallel and Distributed Processing with Applications. 81 / 85
  • 84. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 82 / 85
  • 85. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. 83 / 85
  • 86. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. 83 / 85
  • 87. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. 83 / 85
  • 88. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. • Adding support for vectorization. 83 / 85
  • 89. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. • Adding support for vectorization. • Exploring FPGAs to combine with CUDA and OpenCL. • To introduce LLVM Compiler Framework in the Frontend. 83 / 85
  • 90. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. • Adding support for vectorization. • Exploring FPGAs to combine with CUDA and OpenCL. • To introduce LLVM Compiler Framework in the Frontend. 83 / 85
  • 91. YaCF: The accULL Compiler Juan J. Fumero Thank you for your attention Introduction YaCF Experiments Conclusions Future Work Juan Jos´ Fumero Alfonso e jfumeroa@ull.edu.es 84 / 85
  • 92. YaCF: The accULL Compiler Juan J. Fumero Introduction YaCF Experiments Conclusions Future Work YaCF: The accULL Compiler Undergraduate Thesis Project Juan Jos´ Fumero Alfonso e Universidad de La Laguna 22 de junio de 2012 85 / 85