SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
PG-Strom
GPU Accelerated Asynchronous
Super-Parallel Query Execution

KaiGai Kohei <kaigai@kaigai.gr.jp>
(Tw: @kkaigai)
Self Introduction

• 名前                     海外 浩平 (KaiGai Kohei)
• 所属                     NEC Europe - SAP Global Competence Center
• 仕事                     OSS活用によるイノベーション創出 (20-30%)
                         SAPとのアライアンス、PF製品の拡販 (70-80%)
   • SAPのIn-memory DB “SAP HANA” の認証作業とか
   • CLUSTERPROのSAP認定取得、拡販とか                           特にコレの
                                                      割合がデカい




20130218 - PG-Strom Workshop, Tokyo                                  2
All everyone talks about BIG-DATA




       猫                                    杓子

      熱い視線

                                 BIG DATA
20130218 - PG-Strom Workshop, Tokyo              3
Big-Data Database?




            ¥¥¥¥¥¥
            $$$$$$
            €€€€€€
20130218 - PG-Strom Workshop, Tokyo   4
Homogeneous / Heterogeneous computing


                                                                        KPIs
                 Homogeneous
                 Scale-Up
                                                          •   Computing Performance
                                                          •   Power Consumption
                                                          •   System Cost (HW/SW)
                                      Heterogeneous       •   Variety of Applications
                                      Scale-Up
                                                          •   Vendor Support
                                                          •   Software Development
                                                                        :

                                                      +
    Scale-out
    (not a topic of
     today’s talk)


20130218 - PG-Strom Workshop, Tokyo                                                     5
Design concept of PG-Strom


              The world cheapest
            The most Cost-Effective
              Big-Data Database
• Utilization of open source technology
• Utilization of commodity hardware
  • up-to ?? CPUs
                          まだ、この辺をとやかく
  • up-to ??? GB RAM      言える段階ではない
  • up-to ??? Data size
• Utilization of heterogeneous computing with GPU
20130218 - PG-Strom Workshop, Tokyo                 6
Characteristics of GPU (1/2)

                                                    Nvidia       AMD             Intel
                                                    Kepler       GCN             SandyBridge

                                      Model         Tesla K20X   FirePro S9000   Xeon E5-2690
                                                    (Q4/2012)    (Q3/2012)       (Q1/2012)
                                      Number of     7.1billion   4.3billion      2.26billion
                                      Transistors
                                      Number of     2688         1792            16
                                      Cores         Simple       Simple          Functional
                                      Core clock    732MHz       925MHz          2.9GHz
                                      Peak FLOPS    3.95Tflops   3.23TFlops      185.6GFlops
                                      Memory        6GB, GDDR5   6GB, GDDR5      384GB/socket,
                                      Size / TYPE                                DDR3
                                      Memory        ~192GB/s     ~264GB/s        ~51.2GB/s
                                      Bandwidth
                                      Power         ~235W        ~225W           ~135W
                                      Consumption
                                      Price         $3199?       $2499?          $2061

20130218 - PG-Strom Workshop, Tokyo                                                            7
Characteristics of GPU (2/2)

  Example)

  Zi = Xi + Yi             (0 <= i <= n)


   X0       X1        X2              Xn

   +         +        +               +
   Y0       Y1        Y2              Yn

                                   
   Z0        Z1       Z2              Zn



               Assign a particular “core”


                                            Nvidia’s GeForce GTX 680 Block Diagram
20130218 - PG-Strom Workshop, Tokyo                                                  8
Play with GPU (1/3)
         Memory                                CPU                             GPU
                                                                                      計算負荷
                                                                                     GPUの仕事
                                                                        GPGPU
        on-host            DDR3-1600
                                                                        (non-integrated)
         buffer           (~51.2GB/s)

                                                                                DDR5
              通常のI/O負荷                                                          192.2GB/s
                                         IO HUB
               CPUの仕事                                                              GPU
                                                                   on-device
                                                                                    code
             HDD                                                    buffer
                                        HBA                                      device DRAM


                           SAS 2.0 (600MB/s)         PCI-E 3.0 x16 (~32GB/s)

 Asynchronous Execution of CPU, GPU and PCI-E
 Minimization of data transfer between host and device
20130218 - PG-Strom Workshop, Tokyo                                                            9
Play with GPU (2/3)
   Host code example
void sqrt_float4(int n, float v[])
{
  /* Acquire device memory and data transfer (host -> device) */
  dev_v = clCreateBuffer(cxt, CL_MEM_READ_WRITE,
                         sizeof(float) * n, NULL, &rv);
  /* Enqueue data transfer host to device */
  clEnqueueWriteBuffer(cmdq, dev_x, CL_TRUE, 0, NULL,
                       v, 0, NULL, NULL);
  /* Set arguments of kernel code */
  clSetKernelArg(kernel, 0, sizeof(int), (void *)&n);
  clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&dev_v);
  /* Enqueue invocation of device kernel */
  clEnqueueNDRangeKernel(cmdq, kernel, 1, NULL, &g_itemsz, &l_itemsz,
                         0, NULL, NULL);
  /* Enqueue data transfer device to host */
  clEnqueueReadBuffer(cmdq, dev_x, CL_TRUE, 0, NULL,
                      v, 0, NULL, NULL);
  /* Release device memory */
  clReleaseMemObject(dev_x)
}


20130218 - PG-Strom Workshop, Tokyo                                     10
Play with GPU (3/3)
    Device code example
__kernel void dev_sqrt_float(int length, float x[])
{
  int i = get_global_id(0);

    if (i < length)
      x[i] = sqrt(x[i]);
}


    Host code to load kernel
/* Load source code of the program */
program = clCreateProgramWithSource(cxt, 1,
                        (const char *)&kernel_source,
                        (const size_t *)&kernel_source_len, &rv);
/* Run-time build of the program */
rv = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
/* Create a device kernel object */
kernel = clCreateKernel(program, “dev_sqrt_float”, &rv);


20130218 - PG-Strom Workshop, Tokyo                                 11
Comparison - CPU vs CPU + GPU

     CPU                   CPU + GPU                    • Advantage
 Storage                 Storage 
                                                         • シンプルな演算の
 Host buffer              Host buffer                        超並列処理
                                                         •   非同期実行&
                               DMA: host  device
                                                             パイプライン処理
                          Storage         Parallel     • Disadvantage
                          Host buffer     Calculation
                                                         • ホストデバイス間の
  Loop &                                                     DMA転送コスト
                               DMA: host  device
 Calculation
                          Host buffer                    •   コードの複雑化
                           output

                                        depends on
                                         workload
Host buffer
 output


20130218 - PG-Strom Workshop, Tokyo                                      12
Architecture
                     of
                 PG-Strom


20130218 - PG-Strom Workshop, Tokyo   13
PG-Strom’s Asynchronous Execution model
    vanilla PostgreSQL                                                          PostgreSQL with PG-Strom

                CPU                                                            CPU                      GPU

                                                                                               Asynchronous memory
                                                                                               transfer and execution


                                      Iteration of scan tuples and
                                         evaluation of qualifiers




                                                                        Synchronization


                                                                     Larger “chunk” to scan
                                                                                                   Earlier than
                                                                      the database at once
                                                                                                 “Only CPU” scan


                : Scan tuples on shared-buffers
                : Execution of the qualifiers
                                                                                                                    Page
20130218 - PG-Strom Workshop, Tokyo
                                                                                                                     14
Re-definition of SQL/MED (1/2)

• SQL/MED (Management of External Data)
   • External data source performing as if regular tables
   • Not only “management”, but external computing resources also
                                                                                                        Exec
                                                                           Regular                                    Exec
                                                                                           storage
                                                Query Executor              Table
                                Query Planner
                 Query Parser




                                                                           Foreign             MySQL
                                                                            Table               FDW

  SQL
 Query                                                                     Foreign             Oracle
                                                                                               FDW                    Exec
                                                                            Table

                                                                           Foreign        PG-Strom
                                                                            Table           FDW

                                                                                                               Exec
                                                                                     Regular
                                                                 storage
                                                                                      Table


20130218 - PG-Strom Workshop, Tokyo                                                                                      15
Re-definition of SQL/MED (2/2)
                     Query
                     Parser
                                      SQL/MED API            construction of
  Query                                                       remote SQL
   Tree                                                         remote
                                                                  SQL
                     Query                      FDW
                    Planner                    Planner

                                                                                Remote pgsql
                                                             connection open

                                                              remote query
                    Query                       FDW
                   Executor                   Executor
                                                                 result set

  Result                                                     connection close
   Set

20130218 - PG-Strom Workshop, Tokyo
                                          Extension module                               16
PG-Strom as SQL/MED driver
                     Query
                     Parser                           ① 条件句から、GPU用
                                                     Kernel Codeを自動生成
  Query
   Tree                                                     WHERE log(x) < 10

                                                                         .....
                     Query                 FDW                            ....
                    Planner               Planner                        .....


                                               ② Kernel codeの                    ④ GPU Kernelの
                                                 JIT-compile                      非同期実行

                                                                chunk
                    Query                   FDW
                                                                buffer
                   Executor               Executor

                                                                         ③ Shadow Table
  Result
                                                                          からのロード
   Set

20130218 - PG-Strom Workshop, Tokyo
                                      PG-Strom module    shadow tables
                                                                                          17
Overall architecture
                                                          World of CPU
            regular                shadow
             tables                 tables



                     shared_buffer                   chunk_buffer                    World of GPU
                                                      chunk      GPU
                                                       data      code
                                                                                       GPU Device
                                                                                    chunk Memory
           SeqScan,                    PG-Strom               request               data
             etc...                                           handler JIT                Super
                                                                        compile         Parallel
                                   ForeignScan
                                                               Event                   Execution
Result                                                        monitor                  GPU Kernel
                    Query Executor                                                   GPU Function
                                                          PG-Strom                  kernel
                  PostgreSQL Backend                     GPU Server

                                        Postmaster
                                                                          background worker
                                                                                                    18
 20130218 - PG-Strom Workshop, Tokyo
So what, How fast is it?
  postgres=# SELECT COUNT(*) FROM rtbl
             WHERE sqrt((x-256)^2 + (y-128)^2) < 40;
   count
  --------
   100467
  (1 row)
  Time: 7668.684 ms
  postgres=# SELECT COUNT(*) FROM ftbl
             WHERE sqrt((x-256)^2 + (y-128)^2) < 40;
   count
  --------
   100467
  (1 row)                 Accelerated!
  Time: 857.298 ms
   CPU: Xeon E5-2670 (2.60GHz), GPU: NVidia GeForce GT640, RAM: 384GB
   Both of regular rtbl and PG-Strom ftbl contain 20milion rows with same value
20130218 - PG-Strom Workshop, Tokyo                                            19
Key Technologies

• Automatic GPU code generation & JIT compile
• Column-oriented data structure
• Asynchronous Execution




20130218 - PG-Strom Workshop, Tokyo             20
Automatic “pseudo” code generation

          SELECT * FROM ftbl WHERE
          c like ‘%xyz%’ AND sqrt((x-256)^2+(y-100)^2) < 10;

     contains unsupported
     operators / functions                                    Translation to
                                                              pseudo code
                                      xreg10 = $(ftbl.x)
                                      xreg12 = 256.000000::double
Pseudo-code based implementation
will be replaced by native code and   xreg8 = (xreg10 - xreg12)
JIT-compile approach soon.            xreg10 = 2.000000::double
                                      xreg6 = pow(xreg8, xreg10)
                                      xreg12 = $(ftbl.y)
                                      xreg14 = 128.000000::double
                                               :

20130218 - PG-Strom Workshop, Tokyo                                            21
Automatic native code generation - WIP

          SELECT * FROM ftbl WHERE
          c like ‘%xyz%’ AND sqrt((x-256)^2+(y-100)^2) < 10;

OpenCL run-time builds
native GPU binary
                                      __kernel void
                                      pgstrom_qual(int nitems, bool result[],
                                                    float x[], float y[])
                                      {
                                          int index = get_global_id(0);

                                          if (sqrt(pow(x[i] - 256, 2) +
                                                   pow(y[i] - 100, 2)) < 10)
                                              result[i] = true;
                                          else
                                              result[i] = false;
                                      }

20130218 - PG-Strom Workshop, Tokyo                                             22
Save bandwidth & shared-buffer usage
   E.g) SELECT name, tel, email, address FROM address_book
          WHERE sqrt((pos_x – 24.5)^2 + (pos_y – 52.3)^2) < 10;
    No sense to fetch columns being not in use
           CPU                        GPU                 CPU              GPU




   Synchronization                                   Synchronization

     : Scan tuples on the shared-buffers
                                                Save the bandwidth of PCI-E bus
     : Execution of the qualifiers
     : Columns being not used the qualifiers    Save the shared-buffer usage
20130218 - PG-Strom Workshop, Tokyo                                              23
Column-oriented data structure (1/3)

                                                     (shadow) TABLE “public.ft.rowid”
                                                     rowid nitems                 isnull
      FOREIGN TABLE ft
                                                      4000      2000        {0,0,0,1,0,0,…}
    int    float   text
     X       Y      Z                                 6000      2000        {0,0,0,0,0,0,…}
                                                        :         :                  :
                                             (shadow) TABLE “public.ft.z.cs”
                                                     14000       400        {0,0,1,0,0,0,…}
                                     rowid nitems        isnull              values
                                      4000      15     {0,0,…}       { ‘hello’, ‘world’, … }
                                      4015      20     {0,0,…} { ‘aaa’, ‘bbb’, ‘ccc’, … }
                                (shadow) TABLE “public.ft.y.cs”
                        rowid nitems : isnull :            : values              :
                        4000      25014275
                                         {0,0,…} { 1.38, 6.45, 2.15, …‘yyy’, ‘zzz’, …}
                                                25     {0,0,…}       {‘xxx’, }
                        4250      250    {0,1,…} { 4.32, 5.46, 3.14, … }
                 (shadow): TABLE “public.ft.a.cs”
                                   :         :                    :
           rowid nitems 14200
                          isnull 100         values
                                         {0,0,…} {19, 29, 39, 49, 59, …}
            4000   500   {0,0,…} {10, 20, 30, 40, 50, …}
            4500   500   {0,1,…}     {11, 0, 31, 41, 51, …}
              :      :      :                   :
           14200   200   {0,0,…} {19, 29, 39, 49, 59, …}
20130218 - PG-Strom Workshop, Tokyo                                                      24
Column-oriented data structure (2/3)

 postgres=# CREATE FOREIGN TABLE example
              (a int, b text) SERVER pg_strom;
 CREATE FOREIGN TABLE

 postgres=# SELECT * FROM pgstrom_shadow_relations;
   oid |        relname        | relkind | relsize
 -------+----------------------+---------+-----------
  16446 | public.example.rowid | r       |          0
  16449 | public.example.idx   | i       |      8192
  16450 | public.example.a.cs | r        |          0
  16453 | public.example.a.idx | i       |      8192
  16454 | public.example.b.cs | r        |          0
  16457 | public.example.b.idx | i       |      8192
  16462 | public.example.seq   | S       |      8192
 (9 rows)

 postgres=# SELECT * FROM pg_strom."public.example.a.cs" ;
  rowid | nitems | isnull | values
 -------+--------+--------+--------
 (0 rows)


20130218 - PG-Strom Workshop, Tokyo                          25
Column-oriented data structure (3/3)
② Calculation                                                         opcode               Pseudo Code




                                                 PgStromChunkBuffer
                       ① Transfer                                     rowmap

                                                                      value   a[]     <not used>

                                                                      value   b[]
                      ③ Write-Back
                                                                      value   c[]

                                                                      value   d[]     <not used>


• Less bandwidth consumption                                            Table: my_schema.ft1.b.cs
   of PCI-Express bus                                                 10100       {2.4, 5.6, 4.95, … }
                                                                      10300     {10.23, 7.54, 5.43, … }
• Less usage of buffer-cache
                                                                                  Table: my_schema.ft1.c.cs
• Suitable for data                                                             10100      {‘2010-10-21’, …}
   compression                                                                  10200      {‘2011-01-23’, …}
                                                                                10300      {‘2011-08-17’, …}

PGconf.EU 2012 / PGStrom - GPU Accelerated Asynchronous Execution Module                                       26
Asynchronous Execution (1/2)
           IterateForeignScan
                                                        Yes                                   free_chunk_list
        No more rows
      on current chunk?
                                                                 current
                                                                  chunk                                           Job Queue
            No


                                                                                           next
                           If no chunks are ready yet


                                                              Load chunk from
                                                                                          chunk
                                                               shadow tables


                                                                                GPU
                                                                                code

                                                                                       shadow tables

                                                                                                                GPU Management
                                                                 current                                            Server
          Return next
                                                                  chunk
         TupleTableSlot
                                                                                           ready_chunk_list
20130218 - PG-Strom Workshop, Tokyo                                                                                              27
Asynchronous Execution (2/2) - in the future
           IterateForeignScan
                                 Yes              free_chunk_list
                                                                             Asynchronous       Asynchronous
        No more rows                                                          Data Load          Calculation
      on current chunk?
                                      current
                                       chunk
            No
                                                                         next                  Job Queue




                                                        Job Queue
                                                                        chunk
                                         next
                                        chunk


                                           GPU                      shadow tables
                                           code
                                                        Parallel I/O Server
                                                         Parallel I/O Server
                                                          Parallel I/O Server


                                                                                            GPU Management
                                      current                                                   Server
          Return next
                                       chunk
         TupleTableSlot
                                                            ready_chunk_list
20130218 - PG-Strom Workshop, Tokyo                                                                        28
Eco-System
                   in
               PostgreSQL
              Development


20130218 - PG-Strom Workshop, Tokyo   29
PostgreSQL developer’s community




                                    PostgreSQL
                                    developer’s
                                    community



                                  contribution,           software,
                                    feedback,          documentation,
                                   donation, ...       knowledge, ...
                                               Service
                                          infrastructure,
                                              support,
20130218 - PG-Strom Workshop, Tokyo
                                          consulting, ...               30
PostgreSQL development cycle
                   2011                            2012                         2013
v9.2
cycle
                                                     v9.2 Release
               CommitFest 1st~4th
v9.3                  PGconf2011 &
cycle               developer meeting
                                                          CommitFest 1st~4th
                                                              PGconf2012 &
                                                            developer meeting


                                                   v9.3 development schedule
                                                   • 17th-May developer meeting
                                                   • 15th-Jun CommitFest:1st
                                                   • 15th-Sep CommitFest:2nd
                                                   • 15th-Nov CommitFest:3rd
    PostgreSQL developer meeting (17th-May-2012)
                                                   • 15th-Jan CommitFest:4th
20130218 - PG-Strom Workshop, Tokyo                                                    31
PostgreSQL CommitFest




20130218 - PG-Strom Workshop, Tokyo   32
Key features towards upcoming v9.3 (1/3)

• Background Worker
   • It enables extensions to manage own background worker process
   • Pre-requisite of PG-Strom’s GPU control server
   • KaiGai implemented 1st version, then Alvaro revised and committed

                      Shared Resources (DB cluster, shared mem, IPC, ...)


                                              Built-in              Extra
                                            background             daemon
                    PostgreSQL
                     PostgreSQL
                      PostgreSQL
                       PostgreSQL
                    Backend
                     Backend
                      Backend              (autovacuum,          Own main()
                       Backend              bgwriter...)

                                                                            manage
                                                                 Extension
                                         postmaster

20130218 - PG-Strom Workshop, Tokyo                                                  33
Key features towards upcoming v9.3 (2/3)

• Writable-FDW
   •   It allows FDW-drivers to modify external data source via foreign-table.
   •   Several new APIs shall be added
   •   Helpful for PG-Strom to modify shadow-tables using standard DML
   •   KaiGai submitted patch, then it is “ready-for-committer” status now

   SELECT
                                                         SQL Executor
                                           SQL Planner
                              SQL Parser




   INSERT
                                                                            FDW
                                                                            driver
   UPDATE

   DELETE
                                                                        New API       External
                                                                                     Data Source

20130218 - PG-Strom Workshop, Tokyo                                                                34
Key features towards upcoming v9.3 (3/3)

• Writable-FDW (Pseudo-column support)
   • It required to identify a particular remote-row to be written.
   • “rowid” shall be carried from scan-stage to modify-stage as
      a value of pseudo-column.
   • Pseudo-column concept is also available to push-down complex
      calculation into external computing resource.

       SELECT X, Y, (X-Y)^2 from ftable;
                    
       SELECT X, Y, Pcol_1 from ftable;

      Just reference to               (SELECT X, Y, (X-Y)^2 AS Pcol_1
    the calculated result               from remote_data_source)
     in the remote side
                                                      Remote Query
20130218 - PG-Strom Workshop, Tokyo                                     35
Direction
                    of
               The Future
              Development


20130218 - PG-Strom Workshop, Tokyo   36
Move to OpenCL - WIP

• OpenCL support, instead of CUDA
   • multiplatform support
   • built-in JIT compiler

                   OpenCL Source
                     (CString)

    clCreateProgramWithSource()       ○   ○
                      cl_program

              clCreateKernel()        ○   ×
                       cl_kernel
                                      ○   ×
       clEnqueueNDRangeKernel

20130218 - PG-Strom Workshop, Tokyo           37
Variable Length Data Support - WIP

• Data layout on chunk-buffer is revising, to accept variable-length data.
• Older format assumed fixed-number of items per chunk.
• Newer format assumes fixed-size chunk; consumed from head/tail




                                                                                          to consume
                                                                                            Direction
          for fixed-length values X                       for fixed-length variable A


                                                  for index of variable-length value B
                                                         offset: 123
          for fixed-length values Y



                                          to consume
                                            Direction
                                                                        text: ‘hello world’
                                                        for contents of variable-lengths
          for fixed-length values Z


       Older chunk-buffer layout                        Newer chunk-buffer layout
20130218 - PG-Strom Workshop, Tokyo                                                           38
Procedural Language Support

• This idea allows users to describe complicated logic
   as procedural language to be executed on GPU.
• Expected usage: image processing, genome matching, ...

 CREATE FUNCTION genome_similarity(text,text) RETURNS float AS
 $$
     varlena *genome1 = ARG1;
     varlena *genome2 = ARG2;
         :
      <something complicated logic>
         :
     return similarity;
 $$ LANGUAGE pg_strom;

 SELECT id, label FROM genome_db
     WHERE genome_similarity(data, ‘ATGCAGGT....’) > 0.9;

20130218 - PG-Strom Workshop, Tokyo                         39
You getting involved

I’d like to know ...
• How PG-Strom run on
     real-world dataset and workload
• How PG-Strom should get evolved
• Which region and problem will fit
                                      
All the co-development / co-evaluation
projects are always welcome!
20130218 - PG-Strom Workshop, Tokyo       40
Summary

• Characteristics of GPU & OpenCL
  • Inflexible instructions but much higher parallelism
  • Cheap and small power consumption per computing capability
• PG-Strom - towards most cost-effective database
  • Utilization of GPU to off-load CPU jobs
  • Automatic code generation and JIT compile
  • Asynchronous execution
  • Column-oriented data structure
• Upcoming development
  • Move to OpenCL rather then CUDA
  • Support for variable length values
  • Support for procedural language
• Your involvement will lead future evolution!
20130218 - PG-Strom Workshop, Tokyo                              41
Source

• Source code
   • https://github.com/kaigai/pg_strom
• Wikipage
   • http://wiki.postgresql.org/wiki/PGStrom
      (need maintenance...)




20130218 - PG-Strom Workshop, Tokyo            42
Any Questions?



20130218 - PG-Strom Workshop, Tokyo   43

Contenu connexe

Tendances

PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)NTT DATA Technology & Innovation
 
Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Mydbops
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Giuseppe Paterno'
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_APIKohei KaiGai
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!NTT DATA Technology & Innovation
 
[2018] MySQL 이중화 진화기
[2018] MySQL 이중화 진화기[2018] MySQL 이중화 진화기
[2018] MySQL 이중화 진화기NHN FORWARD
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceMariaDB plc
 
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)NTT DATA Technology & Innovation
 
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.NAVER D2
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQLMydbops
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...Mydbops
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Daniel Hochman
 

Tendances (20)

Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
 
Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
NTT DATA と PostgreSQL が挑んだ総力戦
NTT DATA と PostgreSQL が挑んだ総力戦NTT DATA と PostgreSQL が挑んだ総力戦
NTT DATA と PostgreSQL が挑んだ総力戦
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
 
[2018] MySQL 이중화 진화기
[2018] MySQL 이중화 진화기[2018] MySQL 이중화 진화기
[2018] MySQL 이중화 진화기
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
 
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_standbyの今後について(第19回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
 

Similaire à PG-Strom

PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxssuser30e7d2
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architecturesinside-BigData.com
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA UnconferenceKohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeireimec
 

Similaire à PG-Strom (20)

PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Pgopencl
PgopenclPgopencl
Pgopencl
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
GPU
GPUGPU
GPU
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 

Plus de Kohei KaiGai

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_HistoryKohei KaiGai
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrowKohei KaiGai
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_countKohei KaiGai
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0Kohei KaiGai
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCacheKohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGISKohei KaiGai
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGISKohei KaiGai
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_OnlineKohei KaiGai
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdwKohei KaiGai
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_FdwKohei KaiGai
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_TokyoKohei KaiGai
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.JapanKohei KaiGai
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_BetaKohei KaiGai
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStromKohei KaiGai
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStromKohei KaiGai
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdwKohei KaiGai
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_FdwKohei KaiGai
 

Plus de Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
 

Dernier

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

PG-Strom

  • 1. PG-Strom GPU Accelerated Asynchronous Super-Parallel Query Execution KaiGai Kohei <kaigai@kaigai.gr.jp> (Tw: @kkaigai)
  • 2. Self Introduction • 名前 海外 浩平 (KaiGai Kohei) • 所属 NEC Europe - SAP Global Competence Center • 仕事 OSS活用によるイノベーション創出 (20-30%) SAPとのアライアンス、PF製品の拡販 (70-80%) • SAPのIn-memory DB “SAP HANA” の認証作業とか • CLUSTERPROのSAP認定取得、拡販とか 特にコレの 割合がデカい 20130218 - PG-Strom Workshop, Tokyo 2
  • 3. All everyone talks about BIG-DATA 猫 杓子 熱い視線 BIG DATA 20130218 - PG-Strom Workshop, Tokyo 3
  • 4. Big-Data Database? ¥¥¥¥¥¥ $$$$$$ €€€€€€ 20130218 - PG-Strom Workshop, Tokyo 4
  • 5. Homogeneous / Heterogeneous computing KPIs Homogeneous Scale-Up • Computing Performance • Power Consumption • System Cost (HW/SW) Heterogeneous • Variety of Applications Scale-Up • Vendor Support • Software Development : + Scale-out (not a topic of today’s talk) 20130218 - PG-Strom Workshop, Tokyo 5
  • 6. Design concept of PG-Strom The world cheapest The most Cost-Effective Big-Data Database • Utilization of open source technology • Utilization of commodity hardware • up-to ?? CPUs まだ、この辺をとやかく • up-to ??? GB RAM 言える段階ではない • up-to ??? Data size • Utilization of heterogeneous computing with GPU 20130218 - PG-Strom Workshop, Tokyo 6
  • 7. Characteristics of GPU (1/2) Nvidia AMD Intel Kepler GCN SandyBridge Model Tesla K20X FirePro S9000 Xeon E5-2690 (Q4/2012) (Q3/2012) (Q1/2012) Number of 7.1billion 4.3billion 2.26billion Transistors Number of 2688 1792 16 Cores Simple Simple Functional Core clock 732MHz 925MHz 2.9GHz Peak FLOPS 3.95Tflops 3.23TFlops 185.6GFlops Memory 6GB, GDDR5 6GB, GDDR5 384GB/socket, Size / TYPE DDR3 Memory ~192GB/s ~264GB/s ~51.2GB/s Bandwidth Power ~235W ~225W ~135W Consumption Price $3199? $2499? $2061 20130218 - PG-Strom Workshop, Tokyo 7
  • 8. Characteristics of GPU (2/2) Example) Zi = Xi + Yi (0 <= i <= n) X0 X1 X2 Xn + + + + Y0 Y1 Y2 Yn     Z0 Z1 Z2 Zn Assign a particular “core” Nvidia’s GeForce GTX 680 Block Diagram 20130218 - PG-Strom Workshop, Tokyo 8
  • 9. Play with GPU (1/3) Memory CPU GPU 計算負荷 GPUの仕事 GPGPU on-host DDR3-1600 (non-integrated) buffer (~51.2GB/s) DDR5 通常のI/O負荷 192.2GB/s IO HUB  CPUの仕事 GPU on-device code HDD buffer HBA device DRAM SAS 2.0 (600MB/s) PCI-E 3.0 x16 (~32GB/s)  Asynchronous Execution of CPU, GPU and PCI-E  Minimization of data transfer between host and device 20130218 - PG-Strom Workshop, Tokyo 9
  • 10. Play with GPU (2/3) Host code example void sqrt_float4(int n, float v[]) { /* Acquire device memory and data transfer (host -> device) */ dev_v = clCreateBuffer(cxt, CL_MEM_READ_WRITE, sizeof(float) * n, NULL, &rv); /* Enqueue data transfer host to device */ clEnqueueWriteBuffer(cmdq, dev_x, CL_TRUE, 0, NULL, v, 0, NULL, NULL); /* Set arguments of kernel code */ clSetKernelArg(kernel, 0, sizeof(int), (void *)&n); clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&dev_v); /* Enqueue invocation of device kernel */ clEnqueueNDRangeKernel(cmdq, kernel, 1, NULL, &g_itemsz, &l_itemsz, 0, NULL, NULL); /* Enqueue data transfer device to host */ clEnqueueReadBuffer(cmdq, dev_x, CL_TRUE, 0, NULL, v, 0, NULL, NULL); /* Release device memory */ clReleaseMemObject(dev_x) } 20130218 - PG-Strom Workshop, Tokyo 10
  • 11. Play with GPU (3/3) Device code example __kernel void dev_sqrt_float(int length, float x[]) { int i = get_global_id(0); if (i < length) x[i] = sqrt(x[i]); } Host code to load kernel /* Load source code of the program */ program = clCreateProgramWithSource(cxt, 1, (const char *)&kernel_source, (const size_t *)&kernel_source_len, &rv); /* Run-time build of the program */ rv = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL); /* Create a device kernel object */ kernel = clCreateKernel(program, “dev_sqrt_float”, &rv); 20130218 - PG-Strom Workshop, Tokyo 11
  • 12. Comparison - CPU vs CPU + GPU CPU CPU + GPU • Advantage Storage  Storage  • シンプルな演算の Host buffer Host buffer 超並列処理 • 非同期実行& DMA: host  device パイプライン処理 Storage  Parallel • Disadvantage Host buffer Calculation • ホストデバイス間の Loop & DMA転送コスト DMA: host  device Calculation Host buffer • コードの複雑化  output depends on workload Host buffer  output 20130218 - PG-Strom Workshop, Tokyo 12
  • 13. Architecture of PG-Strom 20130218 - PG-Strom Workshop, Tokyo 13
  • 14. PG-Strom’s Asynchronous Execution model vanilla PostgreSQL PostgreSQL with PG-Strom CPU CPU GPU Asynchronous memory transfer and execution Iteration of scan tuples and evaluation of qualifiers Synchronization Larger “chunk” to scan Earlier than the database at once “Only CPU” scan : Scan tuples on shared-buffers : Execution of the qualifiers Page 20130218 - PG-Strom Workshop, Tokyo 14
  • 15. Re-definition of SQL/MED (1/2) • SQL/MED (Management of External Data) • External data source performing as if regular tables • Not only “management”, but external computing resources also Exec Regular Exec storage Query Executor Table Query Planner Query Parser Foreign MySQL Table FDW SQL Query Foreign Oracle FDW Exec Table Foreign PG-Strom Table FDW Exec Regular storage Table 20130218 - PG-Strom Workshop, Tokyo 15
  • 16. Re-definition of SQL/MED (2/2) Query Parser SQL/MED API construction of Query remote SQL Tree remote SQL Query FDW Planner Planner Remote pgsql connection open remote query Query FDW Executor Executor result set Result connection close Set 20130218 - PG-Strom Workshop, Tokyo Extension module 16
  • 17. PG-Strom as SQL/MED driver Query Parser ① 条件句から、GPU用 Kernel Codeを自動生成 Query Tree WHERE log(x) < 10 ..... Query FDW .... Planner Planner ..... ② Kernel codeの ④ GPU Kernelの JIT-compile 非同期実行 chunk Query FDW buffer Executor Executor ③ Shadow Table Result からのロード Set 20130218 - PG-Strom Workshop, Tokyo PG-Strom module shadow tables 17
  • 18. Overall architecture World of CPU regular shadow tables tables shared_buffer chunk_buffer World of GPU chunk GPU data code GPU Device chunk Memory SeqScan, PG-Strom request data etc... handler JIT Super compile Parallel ForeignScan Event Execution Result monitor GPU Kernel Query Executor GPU Function PG-Strom kernel PostgreSQL Backend GPU Server Postmaster background worker 18 20130218 - PG-Strom Workshop, Tokyo
  • 19. So what, How fast is it? postgres=# SELECT COUNT(*) FROM rtbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count -------- 100467 (1 row) Time: 7668.684 ms postgres=# SELECT COUNT(*) FROM ftbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count -------- 100467 (1 row) Accelerated! Time: 857.298 ms  CPU: Xeon E5-2670 (2.60GHz), GPU: NVidia GeForce GT640, RAM: 384GB  Both of regular rtbl and PG-Strom ftbl contain 20milion rows with same value 20130218 - PG-Strom Workshop, Tokyo 19
  • 20. Key Technologies • Automatic GPU code generation & JIT compile • Column-oriented data structure • Asynchronous Execution 20130218 - PG-Strom Workshop, Tokyo 20
  • 21. Automatic “pseudo” code generation SELECT * FROM ftbl WHERE c like ‘%xyz%’ AND sqrt((x-256)^2+(y-100)^2) < 10; contains unsupported operators / functions Translation to pseudo code xreg10 = $(ftbl.x) xreg12 = 256.000000::double Pseudo-code based implementation will be replaced by native code and xreg8 = (xreg10 - xreg12) JIT-compile approach soon. xreg10 = 2.000000::double xreg6 = pow(xreg8, xreg10) xreg12 = $(ftbl.y) xreg14 = 128.000000::double : 20130218 - PG-Strom Workshop, Tokyo 21
  • 22. Automatic native code generation - WIP SELECT * FROM ftbl WHERE c like ‘%xyz%’ AND sqrt((x-256)^2+(y-100)^2) < 10; OpenCL run-time builds native GPU binary __kernel void pgstrom_qual(int nitems, bool result[], float x[], float y[]) { int index = get_global_id(0); if (sqrt(pow(x[i] - 256, 2) + pow(y[i] - 100, 2)) < 10) result[i] = true; else result[i] = false; } 20130218 - PG-Strom Workshop, Tokyo 22
  • 23. Save bandwidth & shared-buffer usage E.g) SELECT name, tel, email, address FROM address_book WHERE sqrt((pos_x – 24.5)^2 + (pos_y – 52.3)^2) < 10;  No sense to fetch columns being not in use CPU GPU CPU GPU Synchronization Synchronization : Scan tuples on the shared-buffers  Save the bandwidth of PCI-E bus : Execution of the qualifiers : Columns being not used the qualifiers  Save the shared-buffer usage 20130218 - PG-Strom Workshop, Tokyo 23
  • 24. Column-oriented data structure (1/3) (shadow) TABLE “public.ft.rowid” rowid nitems isnull FOREIGN TABLE ft 4000 2000 {0,0,0,1,0,0,…} int float text X Y Z 6000 2000 {0,0,0,0,0,0,…} : : : (shadow) TABLE “public.ft.z.cs” 14000 400 {0,0,1,0,0,0,…} rowid nitems isnull values 4000 15 {0,0,…} { ‘hello’, ‘world’, … } 4015 20 {0,0,…} { ‘aaa’, ‘bbb’, ‘ccc’, … } (shadow) TABLE “public.ft.y.cs” rowid nitems : isnull : : values : 4000 25014275 {0,0,…} { 1.38, 6.45, 2.15, …‘yyy’, ‘zzz’, …} 25 {0,0,…} {‘xxx’, } 4250 250 {0,1,…} { 4.32, 5.46, 3.14, … } (shadow): TABLE “public.ft.a.cs” : : : rowid nitems 14200 isnull 100 values {0,0,…} {19, 29, 39, 49, 59, …} 4000 500 {0,0,…} {10, 20, 30, 40, 50, …} 4500 500 {0,1,…} {11, 0, 31, 41, 51, …} : : : : 14200 200 {0,0,…} {19, 29, 39, 49, 59, …} 20130218 - PG-Strom Workshop, Tokyo 24
  • 25. Column-oriented data structure (2/3) postgres=# CREATE FOREIGN TABLE example (a int, b text) SERVER pg_strom; CREATE FOREIGN TABLE postgres=# SELECT * FROM pgstrom_shadow_relations; oid | relname | relkind | relsize -------+----------------------+---------+----------- 16446 | public.example.rowid | r | 0 16449 | public.example.idx | i | 8192 16450 | public.example.a.cs | r | 0 16453 | public.example.a.idx | i | 8192 16454 | public.example.b.cs | r | 0 16457 | public.example.b.idx | i | 8192 16462 | public.example.seq | S | 8192 (9 rows) postgres=# SELECT * FROM pg_strom."public.example.a.cs" ; rowid | nitems | isnull | values -------+--------+--------+-------- (0 rows) 20130218 - PG-Strom Workshop, Tokyo 25
  • 26. Column-oriented data structure (3/3) ② Calculation opcode Pseudo Code PgStromChunkBuffer ① Transfer rowmap value a[] <not used> value b[] ③ Write-Back value c[] value d[] <not used> • Less bandwidth consumption Table: my_schema.ft1.b.cs of PCI-Express bus 10100 {2.4, 5.6, 4.95, … } 10300 {10.23, 7.54, 5.43, … } • Less usage of buffer-cache Table: my_schema.ft1.c.cs • Suitable for data 10100 {‘2010-10-21’, …} compression 10200 {‘2011-01-23’, …} 10300 {‘2011-08-17’, …} PGconf.EU 2012 / PGStrom - GPU Accelerated Asynchronous Execution Module 26
  • 27. Asynchronous Execution (1/2) IterateForeignScan Yes free_chunk_list No more rows on current chunk? current chunk Job Queue No next If no chunks are ready yet Load chunk from chunk shadow tables GPU code shadow tables GPU Management current Server Return next chunk TupleTableSlot ready_chunk_list 20130218 - PG-Strom Workshop, Tokyo 27
  • 28. Asynchronous Execution (2/2) - in the future IterateForeignScan Yes free_chunk_list Asynchronous Asynchronous No more rows Data Load Calculation on current chunk? current chunk No next Job Queue Job Queue chunk next chunk GPU shadow tables code Parallel I/O Server Parallel I/O Server Parallel I/O Server GPU Management current Server Return next chunk TupleTableSlot ready_chunk_list 20130218 - PG-Strom Workshop, Tokyo 28
  • 29. Eco-System in PostgreSQL Development 20130218 - PG-Strom Workshop, Tokyo 29
  • 30. PostgreSQL developer’s community PostgreSQL developer’s community contribution, software, feedback, documentation, donation, ... knowledge, ... Service infrastructure, support, 20130218 - PG-Strom Workshop, Tokyo consulting, ... 30
  • 31. PostgreSQL development cycle 2011 2012 2013 v9.2 cycle v9.2 Release CommitFest 1st~4th v9.3 PGconf2011 & cycle developer meeting CommitFest 1st~4th PGconf2012 & developer meeting v9.3 development schedule • 17th-May developer meeting • 15th-Jun CommitFest:1st • 15th-Sep CommitFest:2nd • 15th-Nov CommitFest:3rd PostgreSQL developer meeting (17th-May-2012) • 15th-Jan CommitFest:4th 20130218 - PG-Strom Workshop, Tokyo 31
  • 32. PostgreSQL CommitFest 20130218 - PG-Strom Workshop, Tokyo 32
  • 33. Key features towards upcoming v9.3 (1/3) • Background Worker • It enables extensions to manage own background worker process • Pre-requisite of PG-Strom’s GPU control server • KaiGai implemented 1st version, then Alvaro revised and committed Shared Resources (DB cluster, shared mem, IPC, ...) Built-in Extra background daemon PostgreSQL PostgreSQL PostgreSQL PostgreSQL Backend Backend Backend (autovacuum, Own main() Backend bgwriter...) manage Extension postmaster 20130218 - PG-Strom Workshop, Tokyo 33
  • 34. Key features towards upcoming v9.3 (2/3) • Writable-FDW • It allows FDW-drivers to modify external data source via foreign-table. • Several new APIs shall be added • Helpful for PG-Strom to modify shadow-tables using standard DML • KaiGai submitted patch, then it is “ready-for-committer” status now SELECT SQL Executor SQL Planner SQL Parser INSERT FDW driver UPDATE DELETE New API External Data Source 20130218 - PG-Strom Workshop, Tokyo 34
  • 35. Key features towards upcoming v9.3 (3/3) • Writable-FDW (Pseudo-column support) • It required to identify a particular remote-row to be written. • “rowid” shall be carried from scan-stage to modify-stage as a value of pseudo-column. • Pseudo-column concept is also available to push-down complex calculation into external computing resource. SELECT X, Y, (X-Y)^2 from ftable;  SELECT X, Y, Pcol_1 from ftable; Just reference to (SELECT X, Y, (X-Y)^2 AS Pcol_1 the calculated result from remote_data_source) in the remote side Remote Query 20130218 - PG-Strom Workshop, Tokyo 35
  • 36. Direction of The Future Development 20130218 - PG-Strom Workshop, Tokyo 36
  • 37. Move to OpenCL - WIP • OpenCL support, instead of CUDA • multiplatform support • built-in JIT compiler OpenCL Source (CString) clCreateProgramWithSource() ○ ○ cl_program clCreateKernel() ○ × cl_kernel ○ × clEnqueueNDRangeKernel 20130218 - PG-Strom Workshop, Tokyo 37
  • 38. Variable Length Data Support - WIP • Data layout on chunk-buffer is revising, to accept variable-length data. • Older format assumed fixed-number of items per chunk. • Newer format assumes fixed-size chunk; consumed from head/tail to consume Direction for fixed-length values X for fixed-length variable A for index of variable-length value B offset: 123 for fixed-length values Y to consume Direction text: ‘hello world’ for contents of variable-lengths for fixed-length values Z Older chunk-buffer layout Newer chunk-buffer layout 20130218 - PG-Strom Workshop, Tokyo 38
  • 39. Procedural Language Support • This idea allows users to describe complicated logic as procedural language to be executed on GPU. • Expected usage: image processing, genome matching, ... CREATE FUNCTION genome_similarity(text,text) RETURNS float AS $$ varlena *genome1 = ARG1; varlena *genome2 = ARG2; : <something complicated logic> : return similarity; $$ LANGUAGE pg_strom; SELECT id, label FROM genome_db WHERE genome_similarity(data, ‘ATGCAGGT....’) > 0.9; 20130218 - PG-Strom Workshop, Tokyo 39
  • 40. You getting involved I’d like to know ... • How PG-Strom run on real-world dataset and workload • How PG-Strom should get evolved • Which region and problem will fit  All the co-development / co-evaluation projects are always welcome! 20130218 - PG-Strom Workshop, Tokyo 40
  • 41. Summary • Characteristics of GPU & OpenCL • Inflexible instructions but much higher parallelism • Cheap and small power consumption per computing capability • PG-Strom - towards most cost-effective database • Utilization of GPU to off-load CPU jobs • Automatic code generation and JIT compile • Asynchronous execution • Column-oriented data structure • Upcoming development • Move to OpenCL rather then CUDA • Support for variable length values • Support for procedural language • Your involvement will lead future evolution! 20130218 - PG-Strom Workshop, Tokyo 41
  • 42. Source • Source code • https://github.com/kaigai/pg_strom • Wikipage • http://wiki.postgresql.org/wiki/PGStrom (need maintenance...) 20130218 - PG-Strom Workshop, Tokyo 42
  • 43. Any Questions? 20130218 - PG-Strom Workshop, Tokyo 43