SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Introducing PgOpenCL
        A New PostgreSQL
       Procedural Language
Unlocking the Power of the GPU!
                By
             Tim Child
Bio

Tim Child
• 35 years experience of software development
• Formerly
  •   VP Oracle Corporation
  •   VP BEA Systems Inc.
  •   VP Informix
  •   Leader at Illustra, Autodesk, Navteq, Intuit, …
• 30+ years experience in 3D, CAD, GIS and DBMS
Terminology
Term                  Description
Procedure Language    Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )
GPU                   Graphics Processing Unit (highly specialized CPU for graphics)
GPGPU                 General Purpose GPU (non-graphics programming on a GPU)
CUDA                  Nvidia’s GPU programming environment
APU                   Accelerated Processing Unit      (AMD’s Hybrid CPU & GPU chip)
ISO C99               Modern standard version of the C language
OpenCL                Open Compute Language
OpenMP                Open Multi-Processing (parallelizing compilers)
SIMD                  Single Instruction Multiple Data (Vector instructions )
SSE                   x86, x64 (Intel, AMD) Streaming SIMD Extensions
xPU                   Any Processing Unit device (CPU, GPU, APU)
Kernel                Functions that execute on a OpenCL Device
Work Item             Instance of a Kernel
Workgroup             A group of Work Items
FLOP                  Floating Point Operation (single = SQL real type )
MIC                   Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
Some Technology Trends
            Impacting DBMS
• Solid State Storage
    – Reduced Access Time, Lower Power, Increasing in capacity
• Virtualization
    – Server consolidation, Specialized VM’s, lowers direct costs
• Cloud Computing
    – EC2, Azure, … lowers capital requirements
• Multi-Core
    – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications

• xPU (GPU/APU)
    –   GPU >1000 Cores
    –    > 1T FLOP /s @ €2500
    –   APU = CPU + GPU Chip Hybrids due in Mid 2011
    –   2 T FLOP /s for $2.10 per hour (AWS EC2)
    –   Intel MIC “Knights Corner “ > 50 x86 Cores
Compute Intensive
    xPU Database Applications
•   Bioinformatics

•   Signal/Audio/Image Processing/Video

•   Data Mining & Analytics

•   Searching

•   Sorting

•   Spatial Selections and Joins

•   Map/Reduce

•   Scientific Computing

•   Many Others …
GPU vs CPU
Vendor           NVidia       ATI Radeon      Intel
Architecture     Fermi         Evergreen    Nehalem
Cores              448           1600          4
                  Simple        Simple      Complex
Transistors       3.1 B         2.15 B       731 M
Clock            1.5 G Hz      851 M Hz      3 G Hz
Peak Float       1500 G        2720 G         96 G
Performance      FLOP / s      FLOP / s     FLOP / s
Peak Double       750 G         544 G         48 G
Performance      FLOP / s      FLOP / s     FLOP / s
Memory          ~ 190 G / s   ~ 153 G / s   ~ 30 G / s
Bandwidth
Power             250 W        > 250 W        80 W
Consumption
SIMD / Vector     Many          Many         SSE4+
Instructions
Multi-Core Performance




Source NVidia
Future (Mid 2011)
                 APU Based PC
APU (Accelerated Processing Unit)

              APU Chip
      CPU             CPU                 ~20 GB/s     System RAM


         North Bridge
        ~20 GB/s                                           APU’s
                          PCIE ~12 GB/s
                          PCIE ~12 GB/s




                                                     Adds an Embedded
      Embedded                                             GPU
        GPU


                   Discrete
                                          150 GB/s     Graphic RAM
                     GPU

             Source AMD
Scalar vs. SIMD
Scalar Instruction
          C=A+B                           1       +       2        =        3




SIMD Instruction                              1       3       5         7

                                                          +
      Vector C = Vector A + Vector B          2       4       6        8

                                                          =
                                              3       7       11       15


        OpenCL
                  Vector lengths 2,4,8,16 for char, short, int, float, double
Summarizing xPU
            Trends
• Many more xPU Cores in our Future
• Compute Environment becoming Hybrid
  – CPU and GPU’s
  – Need CPU to give access to GPU power
• GPU Capabilities
  – Lots of cores
  – Vector/SIMD Instructions
  – Fast Memory
• GPU Futures
  – Virtual Memory
  – Multi-tasking / Pre-emption
Scaling PostgreSQL Queries
                       on xPU’s
            Multi-Core CPU                                           Many Core GPU


 PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL       PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
  Threads     Threads    Threads    Threads        Threads     Threads    Threads    Thread     Thread



                                                   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
Postgres                                            Threads    Threads    Threads    Thread     Thread
Process


                                                   PgOpenCL              PgOpenCL   PgOpenCL   PgOpenCL
                                                              PgOpenCL
                                                    Threads               Threads    Thread     Thread
                                                               Threads




                                              Using More
                                              Transistors
Parallel
      Programming Systems
Category             CUDA     OpenMP       OpenCL
Language               C      C, Fortran     C
Cross Platform         X          √           √
Standard             Vendor   OpenMP       Khronos
CPU                    X          √           √
GPU                    √          X           √
Clusters               X          √           X

Compilation / Link   Static     Static     Dynamic
What is OpenCL?
• OpenCL - Open Compute Language
  –   Subset of C 99
  –   Open Specification
  –   Proposed by Apple
  –   Many Companies Collaborated on the Specification
  –   Portable, Device Agnostic
  –   Specification maintained by Khronos Group
• PgOpenCL
  – OpenCL as a PostgreSQL Procedural Language
System Overview
                                    DBMS Server

                                                   PgOpenCL
                                                    PgOpenCL
  Web     HTTP     Web               SQL              SQL
                                                       SQL
Browser           Server             Statement     Procedure
                                                    Procedure

                                                       PCIe X2 Bus
                           TCP/IP

                   App
                                      PostgreSQL              GPGPU
                  Server




                                        Disk I/O     Tables
                           TCP/IP
          PostgreSQL
            Client
OpenCL
                       Language
• A subset of ISO C99
   – - But without some C99 features such as standard C99 headers,
   – function pointers, recursion, variable length arrays, and bit fields
• A superset of ISO C99 with additions for:
   –   - Work-items and Workgroups
   –   - Vector types
   –   - Synchronization
   –   - Address space qualifiers
• Also includes a large set of built-in functions
   – - Image manipulation
   – - Work-item manipulation,
   – - Specialized math routines, etc.
PgOpenCL
             Components
• New PostgreSQL Procedural Language
  – Language handler
     • Maps arguments
     • Calls function
     • Returns results
  – Language validator
     • Creates Function with parameter & syntax checking
     • Compiles Function to a Binary format
• New data types
  – cl_double4, cl_double8, ….
• System Admin Pseudo-Tables
  – Platform, Device, Run-Time, …
PgOpenCL
 Admin
PGOpenCL
                        Function Declaration
CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])
AS $BODY$

#pragma PGOPENCL Platform : ATI Stream
#pragma PGOPENCL Device : CPU

__kernel __attribute__((reqd_work_group_size(64, 1, 1)))
void VectorAdd( __global const float *a, __global const float *b, __global float *c)
  {
    int i = get_global_id(0);

      c[i] = a[i] + b[i];
  }

$BODY$
Language PgOpenCL;
PgOpenCL
                                   Execution Model
            A
Table
            B

            Select Table                    100’s - 1000’s of
              to Array                      Threads (Kernels)

                                        xPU
                                           VectorAdd(A, B)
        A           +        B                Returns C               =       C


                            Copy                                                  Unnest Array
                                                                 Copy               To Table
            Table

                C       C    C      C   C   C    C    C      C    C       C   C      C
Using
               Re-Shaped Tables
                       100’s - 1000’s of
    Table of           Threads (Kernels)                  Table of
     Arrays                                                Arrays
                  A    +   B     =         C

A
                                                      C     C        C   C
B
                   xPU
                      VectorAdd(A, B)
                         Returns C
A
                                                      C     C        C   C
B

                Copy
                                               Copy
Today’s GPGPU
              Challenges
• No Pre-emptive Multi-Tasking
• No Virtual Memory
• Limited Bandwidth to discrete GPGPU
   – 1 – 8 G/s over PCIe Bus
• Hard to Program
   – New Parallel Algorithms and constructs
   – “New” C language dialect
• Immature Tools
   – Compilers, IDE, Debuggers, Profilers - early years
• Data organization really matters
   – Types, Structure, and Alignment
   – SQL needs to Shape the Data
• Profiling and Debugging is not easy

Solves Well for Problem Sets with the Right Shape!
Making a Problem
                           Work for You
        • Determine % Parallelism Possible
for ( i = 0, i <  ∞, i++)
            for ( j = 0; j < ∞; j++ )
                      for ( k = 0; k <   ∞; k++ )


        • Arrange data to fit available GPU RAM
        •    Ensure calculation time >> I/O transfer overhead
        •    Learn about Parallel Algorithms and the OpenCL language
        •    Learn new tools
        •    Carefully choose Data Types, Organization and Alignments
        •    Profile and Measure at Every Stage
PgOpenCL
     System Requirements
• PostgreSQL 9.x
• For GPU’s
   – AMD ATI OpenCL Stream SDK 2.x
   – NVidia CUDA 3.x SDK
   – Recent Macs with O/S 11.6
• For CPU’s (Pentium M or more recent)
   – AMD ATI OpenCL Stream SDK 2.x
   – Intel OpenCL SDK Alpha Release (x86)
   – Recent Macs with O/S 11.6
PGOpenCL
                                   Status
    Today        1Q 2011
  Prototype       Beta


     2010             2011


• Wish List
       • Beta Testers
              – Existing OpenCL App?
              – Have a GPU App?
       • Contributors
              – Code server side functions?
       • Sponsors & Supporters
           – AMD Fusion Fund?
           – Khronos?
PgOpenCL
               Future Plans
• Increase Platform Support
• Scatter/Gather Functions
• Additional Type Support
   – Image Types
   – Sparse Matrices
• Run-Time
   –   Asynchronous
   –   Events
   –   Profiling
   –   Debugging
Using the
                                Whole Brain
                        APU Chip
PgOpenCl                           PgOpenCl
  PgOpenCL                           PgOpenCL
                 CPU
         CPU                    CPU
      Postgres                                  You can’t be in a
                                                parallel universe
                                                  with a single
                                                     brain!
                 North Bridge
             ~20 GB/s
                                                 • Heterogeneous Compute Environments
                          PgOpenCl
                            PgOpenCl                  • CPU’s, GPU’s, APU’s
             Embedded         PgOpenCl                • Expect 100’s – 1000’s of cores
                                PgOpenCl
               GPU                PgOpenCL




             The Future Is Parallel: What's a Programmer to Do?
Summarizing
              PgOpenCL
• Supports Heterogeneous Parallel Compute Environments
    • CPU’s, GPU’s, APU’s

• OpenCL
    • Portable and high-performance framework
        –Ideal for computationally intensive algorithms
        –Access to all compute resources (CPU, APU, GPU)
        –Well-defined computation/memory model
    •Efficient parallel programming language
        –C99 with extensions for task and data parallelism
        –Rich set of built-in functions
    •Open standard for heterogeneous parallel computing
• PgOpenCL
   • Integrates PostgreSQL with OpenCL
   • Provides Easy SQL Access to xPU’s
       • APU, CPU, GPGPU
   • Integrates OpenCL
       • SQL + Web Apps(PHP, Ruby, … )
More
                    Information
•   PGOpenCL
        • Twitter @3DMashUp

•   OpenCL

• www.khronos.org/opencl/


• www.amd.com/us/products/technologies/stream-technology/opencl/


• http://software.intel.com/en-us/articles/intel-opencl-sdk


• http://www.nvidia.com/object/cuda_opencl_new.html


• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
Q&A

• Using Parallel Applications?
• Benefits of OpenCL / PgOpenCL?
• Want to Collaborate on PgOpenCL?

Contenu connexe

Tendances

An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUShohei Hido
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)Kohei KaiGai
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacGanesan Narayanasamy
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwJan Holčapek
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 

Tendances (20)

GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdw
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Debugging CUDA applications
Debugging CUDA applicationsDebugging CUDA applications
Debugging CUDA applications
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
GPU Ecosystem
GPU EcosystemGPU Ecosystem
GPU Ecosystem
 

Similaire à Pgopencl

GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science Domino Data Lab
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptxssuser0de10a
 
Machine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsMachine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsAmazon Web Services
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processorsaccount inactive
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciênciaCampus Party Brasil
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkRed Hat Developers
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PC Cluster Consortium
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2Yutaka Kawai
 

Similaire à Pgopencl (20)

GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptx
 
Machine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsMachine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUs
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processors
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciência
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech Talk
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 

Pgopencl

  • 1. Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
  • 2. Bio Tim Child • 35 years experience of software development • Formerly • VP Oracle Corporation • VP BEA Systems Inc. • VP Informix • Leader at Illustra, Autodesk, Navteq, Intuit, … • 30+ years experience in 3D, CAD, GIS and DBMS
  • 3. Terminology Term Description Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … ) GPU Graphics Processing Unit (highly specialized CPU for graphics) GPGPU General Purpose GPU (non-graphics programming on a GPU) CUDA Nvidia’s GPU programming environment APU Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip) ISO C99 Modern standard version of the C language OpenCL Open Compute Language OpenMP Open Multi-Processing (parallelizing compilers) SIMD Single Instruction Multiple Data (Vector instructions ) SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions xPU Any Processing Unit device (CPU, GPU, APU) Kernel Functions that execute on a OpenCL Device Work Item Instance of a Kernel Workgroup A group of Work Items FLOP Floating Point Operation (single = SQL real type ) MIC Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
  • 4. Some Technology Trends Impacting DBMS • Solid State Storage – Reduced Access Time, Lower Power, Increasing in capacity • Virtualization – Server consolidation, Specialized VM’s, lowers direct costs • Cloud Computing – EC2, Azure, … lowers capital requirements • Multi-Core – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications • xPU (GPU/APU) – GPU >1000 Cores – > 1T FLOP /s @ €2500 – APU = CPU + GPU Chip Hybrids due in Mid 2011 – 2 T FLOP /s for $2.10 per hour (AWS EC2) – Intel MIC “Knights Corner “ > 50 x86 Cores
  • 5. Compute Intensive xPU Database Applications • Bioinformatics • Signal/Audio/Image Processing/Video • Data Mining & Analytics • Searching • Sorting • Spatial Selections and Joins • Map/Reduce • Scientific Computing • Many Others …
  • 6. GPU vs CPU Vendor NVidia ATI Radeon Intel Architecture Fermi Evergreen Nehalem Cores 448 1600 4 Simple Simple Complex Transistors 3.1 B 2.15 B 731 M Clock 1.5 G Hz 851 M Hz 3 G Hz Peak Float 1500 G 2720 G 96 G Performance FLOP / s FLOP / s FLOP / s Peak Double 750 G 544 G 48 G Performance FLOP / s FLOP / s FLOP / s Memory ~ 190 G / s ~ 153 G / s ~ 30 G / s Bandwidth Power 250 W > 250 W 80 W Consumption SIMD / Vector Many Many SSE4+ Instructions
  • 8. Future (Mid 2011) APU Based PC APU (Accelerated Processing Unit) APU Chip CPU CPU ~20 GB/s System RAM North Bridge ~20 GB/s APU’s PCIE ~12 GB/s PCIE ~12 GB/s Adds an Embedded Embedded GPU GPU Discrete 150 GB/s Graphic RAM GPU Source AMD
  • 9. Scalar vs. SIMD Scalar Instruction C=A+B 1 + 2 = 3 SIMD Instruction 1 3 5 7 + Vector C = Vector A + Vector B 2 4 6 8 = 3 7 11 15 OpenCL Vector lengths 2,4,8,16 for char, short, int, float, double
  • 10. Summarizing xPU Trends • Many more xPU Cores in our Future • Compute Environment becoming Hybrid – CPU and GPU’s – Need CPU to give access to GPU power • GPU Capabilities – Lots of cores – Vector/SIMD Instructions – Fast Memory • GPU Futures – Virtual Memory – Multi-tasking / Pre-emption
  • 11. Scaling PostgreSQL Queries on xPU’s Multi-Core CPU Many Core GPU PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Threads Threads Threads Threads Threads Thread Thread PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Postgres Threads Threads Threads Thread Thread Process PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Thread Thread Threads Using More Transistors
  • 12. Parallel Programming Systems Category CUDA OpenMP OpenCL Language C C, Fortran C Cross Platform X √ √ Standard Vendor OpenMP Khronos CPU X √ √ GPU √ X √ Clusters X √ X Compilation / Link Static Static Dynamic
  • 13. What is OpenCL? • OpenCL - Open Compute Language – Subset of C 99 – Open Specification – Proposed by Apple – Many Companies Collaborated on the Specification – Portable, Device Agnostic – Specification maintained by Khronos Group • PgOpenCL – OpenCL as a PostgreSQL Procedural Language
  • 14. System Overview DBMS Server PgOpenCL PgOpenCL Web HTTP Web SQL SQL SQL Browser Server Statement Procedure Procedure PCIe X2 Bus TCP/IP App PostgreSQL GPGPU Server Disk I/O Tables TCP/IP PostgreSQL Client
  • 15. OpenCL Language • A subset of ISO C99 – - But without some C99 features such as standard C99 headers, – function pointers, recursion, variable length arrays, and bit fields • A superset of ISO C99 with additions for: – - Work-items and Workgroups – - Vector types – - Synchronization – - Address space qualifiers • Also includes a large set of built-in functions – - Image manipulation – - Work-item manipulation, – - Specialized math routines, etc.
  • 16. PgOpenCL Components • New PostgreSQL Procedural Language – Language handler • Maps arguments • Calls function • Returns results – Language validator • Creates Function with parameter & syntax checking • Compiles Function to a Binary format • New data types – cl_double4, cl_double8, …. • System Admin Pseudo-Tables – Platform, Device, Run-Time, …
  • 18. PGOpenCL Function Declaration CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[]) AS $BODY$ #pragma PGOPENCL Platform : ATI Stream #pragma PGOPENCL Device : CPU __kernel __attribute__((reqd_work_group_size(64, 1, 1))) void VectorAdd( __global const float *a, __global const float *b, __global float *c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } $BODY$ Language PgOpenCL;
  • 19. PgOpenCL Execution Model A Table B Select Table 100’s - 1000’s of to Array Threads (Kernels) xPU VectorAdd(A, B) A + B Returns C = C Copy Unnest Array Copy To Table Table C C C C C C C C C C C C C
  • 20. Using Re-Shaped Tables 100’s - 1000’s of Table of Threads (Kernels) Table of Arrays Arrays A + B = C A C C C C B xPU VectorAdd(A, B) Returns C A C C C C B Copy Copy
  • 21. Today’s GPGPU Challenges • No Pre-emptive Multi-Tasking • No Virtual Memory • Limited Bandwidth to discrete GPGPU – 1 – 8 G/s over PCIe Bus • Hard to Program – New Parallel Algorithms and constructs – “New” C language dialect • Immature Tools – Compilers, IDE, Debuggers, Profilers - early years • Data organization really matters – Types, Structure, and Alignment – SQL needs to Shape the Data • Profiling and Debugging is not easy Solves Well for Problem Sets with the Right Shape!
  • 22. Making a Problem Work for You • Determine % Parallelism Possible for ( i = 0, i < ∞, i++) for ( j = 0; j < ∞; j++ ) for ( k = 0; k < ∞; k++ ) • Arrange data to fit available GPU RAM • Ensure calculation time >> I/O transfer overhead • Learn about Parallel Algorithms and the OpenCL language • Learn new tools • Carefully choose Data Types, Organization and Alignments • Profile and Measure at Every Stage
  • 23. PgOpenCL System Requirements • PostgreSQL 9.x • For GPU’s – AMD ATI OpenCL Stream SDK 2.x – NVidia CUDA 3.x SDK – Recent Macs with O/S 11.6 • For CPU’s (Pentium M or more recent) – AMD ATI OpenCL Stream SDK 2.x – Intel OpenCL SDK Alpha Release (x86) – Recent Macs with O/S 11.6
  • 24. PGOpenCL Status Today 1Q 2011 Prototype Beta 2010 2011 • Wish List • Beta Testers – Existing OpenCL App? – Have a GPU App? • Contributors – Code server side functions? • Sponsors & Supporters – AMD Fusion Fund? – Khronos?
  • 25. PgOpenCL Future Plans • Increase Platform Support • Scatter/Gather Functions • Additional Type Support – Image Types – Sparse Matrices • Run-Time – Asynchronous – Events – Profiling – Debugging
  • 26. Using the Whole Brain APU Chip PgOpenCl PgOpenCl PgOpenCL PgOpenCL CPU CPU CPU Postgres You can’t be in a parallel universe with a single brain! North Bridge ~20 GB/s • Heterogeneous Compute Environments PgOpenCl PgOpenCl • CPU’s, GPU’s, APU’s Embedded PgOpenCl • Expect 100’s – 1000’s of cores PgOpenCl GPU PgOpenCL The Future Is Parallel: What's a Programmer to Do?
  • 27. Summarizing PgOpenCL • Supports Heterogeneous Parallel Compute Environments • CPU’s, GPU’s, APU’s • OpenCL • Portable and high-performance framework –Ideal for computationally intensive algorithms –Access to all compute resources (CPU, APU, GPU) –Well-defined computation/memory model •Efficient parallel programming language –C99 with extensions for task and data parallelism –Rich set of built-in functions •Open standard for heterogeneous parallel computing • PgOpenCL • Integrates PostgreSQL with OpenCL • Provides Easy SQL Access to xPU’s • APU, CPU, GPGPU • Integrates OpenCL • SQL + Web Apps(PHP, Ruby, … )
  • 28. More Information • PGOpenCL • Twitter @3DMashUp • OpenCL • www.khronos.org/opencl/ • www.amd.com/us/products/technologies/stream-technology/opencl/ • http://software.intel.com/en-us/articles/intel-opencl-sdk • http://www.nvidia.com/object/cuda_opencl_new.html • http://developer.apple.com/technologies/mac/snowleopard/opencl.html
  • 29. Q&A • Using Parallel Applications? • Benefits of OpenCL / PgOpenCL? • Want to Collaborate on PgOpenCL?