SlideShare une entreprise Scribd logo
1  sur  29
ERSA, Las Vegas, Nevada, July 2011

Accelerating Real-time processing of the
  ATST Adaptive Optics System using
   Coarse-grained Parallel Hardware
             Architectures
     Vivek Venugopal (vivek@vivekvenugopal.net)
             National Solar Observatory,
               Sunspot, New Mexico
Advanced Technology
  Solar Telescope




                      2
Adaptive Optics system

               Uncorrected                           Tip/Tilt
                  light                              Mirror


               Deformable
               Mirror (DM)                                      Tilt drive signal


     DM drive signal




                                                                                    Corrected
Processors                                       Beamsplitter                         light

                                Shack-Hartmann
                                 Lenslet Array
                        CCD
                       Camera


                                                                                                3
Sub-apertures




                4
HOAO Real-time system
                                                                                   Actuator
                                                                                    gains

                                               Offscale                 Recon-
         Dark               Reference           slope      Slope       struction       Actuator
                Flat field    image
         field                                 tolerance    offsets      matrix          offsets



                                                                                                  Deformable
                                                                                                    mirror
                               Cross-
                                               Offscale
 WFS                         correlation                               Matrix       Actuator
Camera            X            slope
                                                slope
                                               detection
                                                            X          multiply      servos         Servo
                            computation                                                           parameters




                             Average                                                 Tip/Tilt
                              slope                                                  servos         Tip/Tilt
                                                                                                    mirror


                                             Data            Zernike
                                           collection        offload
                                                             process




     • 1750 sub-apertures and 1900 actuators                                                                   5
Camera data format



camera data half                                   camera data half
960 x 480 pixels                                   960 x 480 pixels




  • Camera data consists of two halves of 960x480 pixels
  • Each half of camera data sent to FPGA using 12 channels
                                                                      6
Scenario 1:FPGA-DSP
                                                      96 DSPs


         Camera
                                  FPGA 1
         data half
                     12 optical               12
                       fiber                channels
                     channels




         Camera
                                  FPGA 2
         data half
                     12 optical               12
                       fiber                channels
                     channels

• Pixel unpacking task - FPGA
• Processing - DSPs
                                                                7
Scenario 2:FPGA-DSP
                                                      48 DSPs


         Camera
                                  FPGA 1
         data half
                     12 optical               12
                       fiber                channels
                     channels




         Camera
                                  FPGA 2
         data half                            12
                     12 optical
                       fiber                channels
                     channels


• Pixel unpacking, dark and flat correction- FPGA
• Cross-correlation and reconstruction matrix processing - DSPs
                                                                  8
Dark and flat correction
  pixel0      10
                                                              • Dark pixel and flat pixel stored in
                   -   10
                                                                RAM
dark_pixel 8

                        8
                            x           18    flat_product0
                                                              • Flat corrected product is
  flat_pixel                     8
                                    accumulator
                                                    8
                                                                concatenated and written to
                                                  flat_acc1
 pixel 1      10
                                                                FIFO
                   -   10
                                                              • Flat accumulated value can be
                                                                used to update the reference
dark_pixel 8

  flat_pixel             8
                            x   8
                                        18    flat_product1


                                                                image
                                                    8
                                    accumulator
                                                  flat_acc1




pixel16      10



                   -   10


dark_pixel 8

 flat_pixel             8
                            x   8
                                       18    flat_product16


                                                   8
                                    accumulator
                                                  flat_acc16
                                                                                                 9
Pixel unpacking & Dark
                                           and flat correction
                   Synchronizer/
                     counters

                                                                             dark and flat                reference image
                                                                              value RAM                        RAM
                                        206.8 ns
                                                                                   20 ns
                                                                             256
              channel 1
                                                                                                       128
                                         Data                         160       Dark-flat correction/
                     Receiver                                FIFO
                                        unpack                                     accumulator
                                   16               160
                                                                                                       288

              channel 2




                                                                                                               PCIe system bus
                                                                                                       128
                                         Data                         160       Dark-flat correction/
12 channels




                     Receiver                                FIFO
1/2 camera




                                        unpack                                     accumulator
                                   16               160
                                                                                                       288




              channel 12
                                                                                                       128
                                         Data                         160       Dark-flat correction/
                     Receiver                                FIFO
                                        unpack                                     accumulator
                                   16               160
                                                                                                       288

                                                    clock period = 9.42 ns              clock period = 5 ns
                                                   clock rate = 106.15 MHz             clock rate = 200 MHz


                                                                                                                                 10
Scenario 3:FPGA-GPU
                                    or FPGA-CPU

        Camera
                                  FPGA 1
        data half
                    12 optical
                      fiber




                                           PCI-e bus
                    channels
                                                       GPU/CPU




        Camera
                                  FPGA 2
        data half
                    12 optical
                      fiber
                    channels


• Pixel unpacking, dark and flat correction - FPGA
• Cross-correlation and reconstruction matrix processing - GPU or CPU
                                                                   11
Nvidia Tesla C2050
   GPU
                                                                                   Multiprocessor 14
                                                                                                       •   Nvidia Tesla C2050: 14
                                                                                                           streaming multi-processors
                                                                              Multiprocessor 2             with 32 cores each (SIMD)
                                                                         Multiprocessor 1

                                  Instruction Cache
                                                                                                           clocked at 1.15 GHz
              Warp Scheduler                          Warp Scheduler                                   •   3 GB on-board RAM
               Dispatch Unit                             Dispatch Unit
                                                                                                       •   Kernel-based execution
                                    Register File
                                                                                                       •   1.288 TFLOPS single
     Core 1      Core 2          Core 1    Core 2
                                                            Load/
                                                           Store 1
                                                                           SFU 1                           precision
                                                           Load/           SFU 2
     Core 3      Core 4          Core 3    Core 4
                                                          Store 2                                      •   515.2 GFLOPS double
                                                                           SFU 3

                                                           Load/
                                                                                                           precision
    Core 15     Core 16         Core 15    Core 16                         SFU 4
                                                          Store 16

                               Interconnection Network

                          64 KB Shared Memory/ L1 cache


                                   Uniform Cache




Reference: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf 12
Process mapping and
                                       partitioning


                  Raw          Flat           Reference
                 pixels       pixels            pixels
                 20x20        20x20            20x20
         FPGA                                                    GPU

 Dark                                                               find        x and y
                  dark          flat       2D cross-correlation
pixels                                                            maximum   interpolation
                correction   correction
20x20




                                                                                        13
Correlation routines
      1. FFT correlation                             2. 7x7 correlation

   flat
                    reference
corrected
                      image
 image
                                                                                      precomputed
                                           original reference      Region 1             reference
            FFT                 FFT       image 26x26 pixels                         (20x20 pixels)




                                                                                      precomputed
                                                                   Region 2             reference
             Complex conjugate                                                       (20x20 pixels)
               Multiplication




                   IFFT

                                                                                      precomputed
                                                                  Region 49             reference
                                                                                     (20x20 pixels)

                                                   Precomputed Reference pixels 20x20 (49 regions)
                                                                                                      14
find_max and
                                interpolation routines
•   Find the maximum value and itʼs index
•   Find x and y shifts using the interpolation equations

    num x = max value − out(shif ted y index, (shif ted x index − 1)
    den x = 2 ∗ max value − out(shif ted y index, (shif ted x index − 1))
                              −out(shif ted y index, (shif ted x index + 1))
                                          num x
         x = (shif ted x index − 0.5) +
                                          den x
    num y = max value − out((shif ted y index − 1), shif ted x index)
     den y = 2 ∗ max value − out((shif ted y index − 1), shif ted x index)
                             −out((shif ted y index + 1), shif ted x index))
                                          num y
         y = (shif ted y index − 0.5) +
                                          den y

                                                                               15
GPU results
                                          Tesla C1060
                      FFT correlation     Tesla C2050               7x7 correlation
             2200                                       400
                                   1889
                                                                313       307      301
                               1619                           278       279      281
             1650      1510                             300
Time in us




                                           Time in us
                    1188
             1100                                       200


              550                                       100


                0                                         0
                       1          50                            1         50        584
                       No. of images                                No. of images
Note: Least time indicates better performance                                             16
Reconstruction routine

                                    1900
                                                                                                           Tesla C1060
       x              y
                                                                                                           Tesla C2050

1750         1750
                             x                                                                             DSP
                                                                                                           CPU
   x and y shifts for 1750
    sub-aperture images
                             3500
                                                                                             100000                 46769
                                       reconstruction matrix 1900x3500

                                                                                             10000
                                                                                                      964 956
                                                                                Time in us
                                                                         1900
                                                                                              1000
                                                                                                                229
                                           accumulated values for 1900
                                                   actuators                                   100

                                                                                                10
   • 1750 sub-aperture x and y shifts
   • 3500 x 1900 reconstruction matrix                                                           1

                                                                                                          Devices        17
Scenario 4:FPGA-GPU
                                    or FPGA-CPU

        Camera
                                  FPGA 1
        data half
                    12 optical
                      fiber




                                            PCI-e bus
                    channels
                                                         GPU/CPU




        Camera
                                  FPGA 2
        data half
                    12 optical
                      fiber
                    channels


• Pixel unpacking, dark and flat correction, cross-correlation - FPGA
• Reconstruction matrix processing - GPU or CPU
                                                                       18
Cross-correlation
                        18                                 •    Configure 400x392 (49x8 bits/
                flat_product0                                    pixel) RAM bank (RAM0-RAM19)
       18

                        8
                               x   26 xcorr_product0
                                                                with pre-computed reference
flatcorr_value                                                   pixels
                 ref_pixel0
      392
                                                           •    Multiply each pixel with
                        18
  ref_pixel
                                                                corresponding reference pixel
                flat_product0


                        8
                               x   26 xcorr_product1
                                                                1274

                                                        xcorr_value_per pixel
                 ref_pixel1



                        18

                flat_product0


                         8
                               x   26 xcorr_product48




                 ref_pixel48

                                                                                                19
Sub-aperture format
Channel #       Channel #       15 14 13 12 11 10 9                8    7    6    5    4    3    2    1    0

   1
            0
            1      1
                            0
                            1
                                3
                                8
                                     3
                                     8
                                          3
                                          8
                                               3
                                               8
                                                    2
                                                    7
                                                         2
                                                         7
                                                              2
                                                              7
                                                                   2
                                                                   7
                                                                        1
                                                                        6
                                                                             1
                                                                             6
                                                                                  1
                                                                                  6
                                                                                       1
                                                                                       6
                                                                                            0
                                                                                            5
                                                                                                 0
                                                                                                 5
                                                                                                      0
                                                                                                      4
                                                                                                           0
                                                                                                           4
                                                                                                                • Sub-aperture regions in 480 columns x
   2        2
            3
                   2        2
                            3
                                13
                                18
                                     13
                                     18
                                          13
                                          18
                                               13
                                               18
                                                    12
                                                    17
                                                         12
                                                         17
                                                              12
                                                              16
                                                                   12
                                                                   16
                                                                        11
                                                                        15
                                                                             11
                                                                             15
                                                                                  10
                                                                                  15
                                                                                       10
                                                                                       15
                                                                                            9
                                                                                            14
                                                                                                 9
                                                                                                 14
                                                                                                      9
                                                                                                      14
                                                                                                           9
                                                                                                           14
                                                                                                                  1 row per channel
            4               4   23   23   22   22   21   21   21   21   20   20   20   20   19   19   19   19

            0               0   4    4    4    4    3    3    2    2    1    1    1    1    0    0    0    0
                                                                                                                • Accumulate pixels per sub-aperture in
   3
   4
            1
            2
                   3
                   4
                            1
                            2
                                9
                                13
                                     9
                                     13
                                          8
                                          13
                                               8
                                               13
                                                    7
                                                    12
                                                         7
                                                         12
                                                              7
                                                              12
                                                                   7
                                                                   12
                                                                        6
                                                                        11
                                                                             6
                                                                             11
                                                                                  6
                                                                                  11
                                                                                       6
                                                                                       11
                                                                                            5
                                                                                            10
                                                                                                 5
                                                                                                 10
                                                                                                      5
                                                                                                      10
                                                                                                           5
                                                                                                           10
                                                                                                                  each channel
            3               3   18   18   18   18   17   17   17   17   16   16   16   16   15   15   14   14                      1274                           1715
            4               4   23   23   23   23   22   22   22   22   21   21   20   20   19   19   19   19      xcorr_pixel0                                          subap0_acc
                                                                                                                                   1274                           1715
            0               0   4    4    4    4    3    3    3    3    2    2    2    2    1    1    0    0       xcorr_pixel1                                          subap1_acc
                                                                                                                                           subap_accumulator
   5        1      5        1   9    9    9    9    8    8    8    8    7    7    6    6    5    5    5    5                               channel #1,#2,#7,#8
   6        2      6        2   14   14   14   14   13   13   12   12   11   11   11   11   10   10   10   10
            3               3
                                                                                                                                   1274                           1715
                                19   19   18   18   17   17   17   17   16   16   16   16   15   15   15   15      xcorr_pixel15                                         subap23_acc
            4               4   23   23   23   23   22   22   22   22   21   21   21   21   20   20   20   20

            0               0   3    3    3    3    2    2    2    2    1    1    1    1    0    0    0    0
                                                                                                                                   1274                           1715
   7        1      7        1   8    8    8    8    7    7    7    7    6    6    6    6    5    5    4    4       xcorr_pixel0                                          subap0_acc
   8        2      8        2   13   13   13   13   12   12   12   12   11   11   10   10   9    9    9    9                       1274                           1715
            3               3   18   18   18   18   17   17   16   16   15   15   15   15   14   14   14   14      xcorr_pixel1                                          subap1_acc
                                                                                                                                           subap_accumulator
            4               4   23   23   22   22   21   21   21   21   20   20   20   20   19   19   19   19                             channel #3,#4,#9,#10

            0               0   4    4    4    4    3    3    2    2    1    1    1    1    0    0    0    0                       1274                           1715
                                                                                                                   xcorr_pixel15                                         subap23_acc
   9        1      9        1   9    9    8    8    7    7    7    7    6    6    6    6    5    5    5    5
   10       2      10       2   13   13   13   13   12   12   12   12   11   11   11   11   10   10   10   10
            3               3   18   18   18   18   17   17   17   17   16   16   16   16   15   15   14   14
            4               4   23   23   23   23   22   22   22   22   21   21   20   20   19   19   19   19
                                                                                                                                   1274                           1715
                                                                                                                   xcorr_pixel0                                          subap0_acc
            0               0   4    4    4    4    3    3    3    3    2    2    2    2    1    1    0    0                       1274                           1715
   11       1      11       1   9    9    9    9    8    8    8    8    7    7    6    6    5    5    5    5        xcorr_pixel1                                         subap1_acc
                                                                                                                                           subap_accumulator
   12       2      12       2   14   14   14   14   13   13   12   12   11   11   11   11   10   10   10   10                             channel #5,#6,#11,#12
            3               3   19   19   18   18   17   17   17   17   16   16   16   16   15   15   15   15
            4               4   23   23   23   23   22   22   22   22   21   21   21   21   20   20   20   20                      1274                           1715
                                                                                                                   xcorr_pixel15                                         subap23_acc
                                                                                                                                                                                       20
Timing

       Rxdata from transceiver

           unpacked data          123.73 ns
           written to FIFO
                                  40 ns
         unpacked data read                                   95 ns
             from FIFO
                                     15 ns
           dark-flat output
                                                      40 ns
         input to xcorr_pixel
               module
                                                      20 ns
       output from xcorr_pixel
                                                        16 ns
       output from sub-aperture
       accumulator per channel

                                              91 ns




•   Each data packet is available from the FIFO after 95 ns
•   95 ns * 5 packets * 10 rows = 4.75 us to read the data from the FIFO
                                                                               21
Timing with find_max &
                                             interpolation
sub-aperture                               2*max_value
accumulated                max_value                             36
   value
                                35                       subap value 1715
                 find_max   max_index                   max_index - 1                    x_shift1                     x
  1715
                                 6                 max_index + 1                        x_shift2 35                32
                                         index                            index                        x and y
                                       calculation max_index - 7         decoder        y_shift1 35   shift calc     y
                                                   max_index + 7                        y_shift2 35                32
                                                                                                 35
                                                              shifted_x - 0.5
                                       shifted index                               32
                                        calculation           shifted_y - 0.5
                                                                                   32




                                        5.84 us
 output from sub-
     aperture
  accumulation
across 6 channels
                      0.060 us                                          0.060 us

     xy shifts




                                                                                                                         22
GPU vs FPGA vs DSP
100 us           225 us                         300.93 us


Camera
readout

          Data transfer through
                PCIe x16


                          C2050 GPU 1



                                  C2050 GPU 2



                                            C2050 GPU 3


                   C2050 GPU throughput = 525.93 us


                     FPGA


          FPGA throughput = 280 us



                                  DSP


                        96 DSPs throughput = 495 us


                                                       Camera
                                                       readout

                                                                 Data transfer through
                                                                       PCIe x16


                                                                                   C2050 GPU 1




                                                                                                 23
Conclusions




                                   GPU                              FPGA



•    DSP: excellent performance but not cost-effective
•    GPU: fast SIMD architectures - suitable for certain tasks
•    FPGA: MIMD architectures, custom I/O, meets latency and throughput
     constraints
Slide idea: David Pellerin, Impulse Accelerated Technology                 24
Future work

                                  Virtex-6           Virtex-7
            Resources
                                XC6VLX550T          XC7V2000T
       Slice logic resources       549,888           1,954,560
             I/O pins                840                850
        GTX transceivers             36                  36


•   Investigate performance improvement after partitioning 3 channels
    between FPGAs. Total of 5 FPGAs for processing each half of
    camera data
•   Throughput sustained even if the processes are partitioned over
    multiple FPGAs
•   Promising because of increased logic density in Virtex-7 FPGAs
                                                                        25
Discussion

                            Questions




Email: vivek@vivekvenugopal.net                26
Top level design


              channel_cycle_count
                                                                                                                    288              288
                                                                                        160
               subap_row_count      refim_fetch_addr_d   RAM bank (RAM0-    FCFPGA                dark_flat_acc_top         Flatcorr
                                                                                                                                                  xcorr_pixel_channel                     ch1278_subap_accumulator
                                         ecoder             RAM19)                                                        _FIFO

               addr_decoder_ce                                                                                                                                                                                         subap_acc_out
                                                                                                                                                                                                                       (1715 bits) x24
                                     address decoder                      data unpack                                                                                     xcorr_pixel
                                                                                                                                      refim_in                           (1274 bits) x16
xcorr_sm        xcorr_pixel_ce                                                                                                       (392 bits)
                                                                                                                                        x16
                 subap_acc_ce
                                                                                              channel1_top

              subap_acc_12ch_ce

xcorr state
                  flat_fifo_rd
 machine
                                                                                                                                                                                                                                                         subap_acc_out
                                                                                                                                                                                                                                         24subap_12ch_   (1715 bits) x24
                                                                                                                                                                                                                                           accumulator




                                                                                                                    288              288
                                                                                        160
                                                                           FCFPGA                dark_flat_acc_top         Flatcorr
                                                                                                                                                  xcorr_pixel_channel                     ch561112_subap_accumulator
                                                                                                                          _FIFO

                                                                                                                                                                                                                       subap_acc_out
                                                                                                                                                                          xcorr_pixel                                  (1715 bits) x24
                                                                          data unpack
                                                                                                                                      refim_in                           (1274 bits) x16
                                                                                                                                     (392 bits)
                                                                                                                                        x16


                                                                                              channel12_top




                                                                                                                                                                                                                                                  27
Synthesis estimates for
                                   Virtex-6 FPGA
•   Implement dark, flat correction only : resources used 288 out of
    687,360 (1%)
•   Implement the correlation for single channel up to the sub-aperture
    accumulator within the channel (without the final 12 channel
    accumulation) : resources used 2,578 out of 687,360 (1%)

                         Device utilization summary:
    Slice Logic Utilization:
    Number of Slice Registers: 992448 out of 687360         144% (*)
    Number of Slice LUTs:         1126081 out of 343680 327% (*)
    Number used as Logic:         1125853 out of 343680 327% (*)
    Number used as Memory:        228 out of 99200
    Number used as SRL:           37
                                                                          28
Virtex-6 FPGA Board
                 /-5




                                                                                                                                                                                                                                                                                                              #78$")*'+,-.+
                                 01Y57
                                                                _/]^d^
                                 01Y57]




                                                                                                                                                                                                                                                                                                                 90':0&
                                 01Y570                                                                                /01                                        #2.     %     !"#$%&&'($")*'+,-.+               %&)-L          !"#$%&&'($")*'+,-.+     %&)-L          !"#$%&&'($")*'+,-.+
                                                                                                                        23            %$V%$$V%$$$
                                                                                                                                          1@S                                    /.++0/1.&$23445*-+6               0OJA@          /.++0/1.&$23445*-+6     0OJA@          /.++0/1.&$23445*-+6
                                                                                                                       /014
                                          Ded&*7
                                          9&*7-`c?                                                                                                                                                                                                                                                ) #2.
                                                                                                                                                                                   !"             !"                                !"        !"                           !"          !"
                                 01Y575                                                                              /565788
                                 01Y57F                                                                             9:;<=>;?                                                                                       #                                     #
                                                                _/]^dD                                                                                                                                                  #2.                                    #2.
                                 01Y57.
                                                                                                                                                                                     !"#$                                             !"#$                                   !"#$




                                                                                                                                                                                                                                                                                                               ;;<=$>?;@!!
                                                                                                                                                                                                                   #    #2.                              #     #2.




                                                                                                                                                            ;;<=$>?;@!!




                                                                                                                                                                                                                                                                                                                 23#A$!')6
                                                                                                                                                                                       !                                                D                                      C




                                                                                                                                                              23#A$!')6
                                                                                                                                                                                                                   *)                                    "$
                                                                                                                       9@2AB?                                                                                                                                                                           %+$
                                          %%#X&)*7-`c                                                                                                                     %+$        %&'()*+,                                         %&'()*+,                               %&'()*+,
   >?@A&B<!"#$




                          &#                                                                                                                                                     -./0123-.4,52                     *)
                                                                                                                                                                                                                                  -./0123-.4,52          "$
                                                                                                                                                                                                                                                                         -./0123-.4,52
                         -`c               03;RM;H>S                                                                                                                                 6.4752                                           6.4752                                 6.4752
                                           /SHB@;A=c;3         Y$                                                                                                                                                  &$                                    #$
                                                                                                                                                                                 -.551236.0852                                    -.551236.0852                          -.551236.0852
                                            9/=*+&"?           9&e`c                                                                                                                9!!785:;                                         9!!785:;                               9!!785:;
                                                               B27'$$7-`c?
                                                                                                                                                                                                                   &$                                    #$
                                          %%#X&)*7-`c                                                                       #$




                                                                                                                                                                                                                                                                                                  #2.
                                                                                                                                                                                                        )                                                                                     #




                                                                                                                                                                                                            #2.




                                                                                                                                                                                                                                                   #2.
                          &#
                         -`c               03;RM;H>S                                  !"#$<=                                     #$
                                                                                                                                      #$
                                           /SHB@;A=c;3         Y%                         %&'()*+5                                         #$
                                                                                                                                                                                                            %&                                      %"
                                            9/=*+&"?           9&e`c
                                                                                     >?@A&B<!"#$                                                #$                               #$ #$ '#( '#                                         !* !*                              &$ &$ '+ '+




                                                                                                                                                                                                  #2.




                                                                                                                                                                                                                                                                                        #2.
                                                               B2'$$7-`c?                                                                            #$                                                                                                                                           %&
                                                                                                                                 ,-.
                                          %%#X&)*7-`c                                                  ">F)                                                                                                        #    #2.                              #     #2.
                          &#




                                                                                                                                                                                                                                                                                                                ;;<=$>?;@!!
                         -`c               03;RM;H>S                                                                                                                                 !"#$                                             !"#$                                   !"#$



                                                                                                                                                            ;;<=$>?;@!!




                                                                                                                                                                                                                                                                                                                  23#A$!')6
                                           /SHB@;A=c;3         Y&                                                                                                                                                  #    #2.                              #     #2.
                                                                                                                                                                                       >                                                E                                      $



                                                                                                                                                              23#A$!')6
                                            9/=*+&"?           9&e`c                                                                                                                                              %&$                                    %$$
                                                                                                                                                                          %+$                                                                                                                           %+$
                                                               B27'$$7-`c?                                                                                                           %&'()*+,                                         %&'()*+,                               %&'()*+,
                                                                                                                                                                                 -./0123-.4,52                                    -./0123-.4,52                          -./0123-.4,52
                                                                                                                                                                                                                  %&$                                    %$$
                   %&*7-`c                                                                                                                                                           6.4752                                           6.4752                                 6.4752
                   %*$7-`c                                                                                                                                                       -.551236.0852                    &$              -.551236.0852          #$              -.551236.0852
                           W/
                   &*$7-`c                                                                                                                                                          9!!785:;                                         9!!785:;                               9!!785:;
                 +%&X*7-`c                                      -Y67O2>U
                                                                                                                                                                                                                   &$                                    #$
                          Y$                                                                                                                                                                                       #$
                                                                                                                                                                                        )                                                                                         )




                                                                                                                                                                                            #2.
                                                                                           -117.MA




                                                                                                                                                                                                                                                                                 #2.
                                                                                                           ">F)            W/                                                                                     #$
                                                                                                           9#)@7;
                                                                                                           0<+<GH@)I

                                                                                                                                                                                 #78$")*'+,-.+                                           %&)-L                          #78$")*'+,-.+                         %&)-L
                                                                                                                                                                                    90':0&                                               0D5/`                             90':0&                              0OJA@
%$V%$$V%$$$                      %$V%$$V%$$$   ^Y-88
   LJA;6                             1@S
          ^b#*
                                                                                   JH'K)GG<J%8L/11                          "#
                                                                                                                                                      BCD!$)$E3
                                                                                                                                                        ;;<C                                                                  C7DEF/7G@;H7IJ=3;:K7LMB7>JH7L;73MH
                         ^/&+&                                                                       01_      01_
                                                                                                                                                                                                                              777A=HNO;P;H:;:7JB73;:M>;:7Q3;RM;H>S
                                                                     F-59#[?                                                +                           %&)-L
_/.7&X$                                                                                              1_ 1_
                                                         _/.                                                                                              /18
 9+[?                                                                                                                                                 .22B70D5/`
                                                                    /565           18;
                                                                                                                                                                                                                  #2. PPT71J>U;B78VW763JHA>;=<;3A
                                                                                     #POJH;A                                )                          &*"-L
                                                                                                                                                                                                                         77777"X*7YLVA7I;37>@JHH;O
                       /565788                                               ^6     18;79Y],%?                                                     ,5,F70D5/`                                                          77777L=:=3;>B=2HJO79%+7YLVA7ZJ[?
                        9@2AB?                                                                                                                          .22B
                          &[
                                                                             187]a1^]//




                                                                                                                        DiniFPGA board                                                                                                                               CM%,!,">F)
                                                                                                                                                                                                                                                                     !"#$%&'()*+),                       -./.0
                                                                                                                                                                                                                                                                                                                              29

Contenu connexe

En vedette

The Transitions Group Llc 2009
The Transitions Group Llc 2009The Transitions Group Llc 2009
The Transitions Group Llc 2009John A Deasy
 
Met energie-efficiëntie naar 2020
Met energie-efficiëntie naar 2020Met energie-efficiëntie naar 2020
Met energie-efficiëntie naar 2020Tim Vermeir
 
Deshiroj Te Pendohem Por
Deshiroj Te Pendohem PorDeshiroj Te Pendohem Por
Deshiroj Te Pendohem Porguestef339
 
Present simple
Present simplePresent simple
Present simplevitita
 
Inventions
InventionsInventions
Inventionsvitita
 
Informe de 1ª avaliación 2015 16
Informe de 1ª avaliación 2015 16Informe de 1ª avaliación 2015 16
Informe de 1ª avaliación 2015 16Anxos bibliotequeira
 
School Presentation
School PresentationSchool Presentation
School Presentationsalieeri
 
Cn Beijing Olympic 1v1
Cn Beijing Olympic 1v1Cn Beijing Olympic 1v1
Cn Beijing Olympic 1v1Peter Chan
 
Agile Open Jam at Product Management Festival 2014
Agile Open Jam at Product Management Festival 2014Agile Open Jam at Product Management Festival 2014
Agile Open Jam at Product Management Festival 2014EBG Consulting, Inc.
 
Picnic
PicnicPicnic
Picnicvitita
 

En vedette (18)

Valoración de lectura
Valoración de lecturaValoración de lectura
Valoración de lectura
 
The Transitions Group Llc 2009
The Transitions Group Llc 2009The Transitions Group Llc 2009
The Transitions Group Llc 2009
 
Met energie-efficiëntie naar 2020
Met energie-efficiëntie naar 2020Met energie-efficiëntie naar 2020
Met energie-efficiëntie naar 2020
 
Egypt
EgyptEgypt
Egypt
 
Deshiroj Te Pendohem Por
Deshiroj Te Pendohem PorDeshiroj Te Pendohem Por
Deshiroj Te Pendohem Por
 
Present simple
Present simplePresent simple
Present simple
 
Inventions
InventionsInventions
Inventions
 
CISL talk
CISL talkCISL talk
CISL talk
 
Informe de 1ª avaliación 2015 16
Informe de 1ª avaliación 2015 16Informe de 1ª avaliación 2015 16
Informe de 1ª avaliación 2015 16
 
Port 2010
Port 2010Port 2010
Port 2010
 
School Presentation
School PresentationSchool Presentation
School Presentation
 
Cn Beijing Olympic 1v1
Cn Beijing Olympic 1v1Cn Beijing Olympic 1v1
Cn Beijing Olympic 1v1
 
Egypt
EgyptEgypt
Egypt
 
پشته ناب
پشته نابپشته ناب
پشته ناب
 
Agile Open Jam at Product Management Festival 2014
Agile Open Jam at Product Management Festival 2014Agile Open Jam at Product Management Festival 2014
Agile Open Jam at Product Management Festival 2014
 
Schanitzel2
Schanitzel2Schanitzel2
Schanitzel2
 
Bill woodman 5 2013
Bill woodman 5 2013Bill woodman 5 2013
Bill woodman 5 2013
 
Picnic
PicnicPicnic
Picnic
 

Dernier

ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 

Dernier (20)

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 

Accelerating Real-time processing of the ATST Adaptive Optics System using Coarse-grained Parallel Hardware Architectures

  • 1. ERSA, Las Vegas, Nevada, July 2011 Accelerating Real-time processing of the ATST Adaptive Optics System using Coarse-grained Parallel Hardware Architectures Vivek Venugopal (vivek@vivekvenugopal.net) National Solar Observatory, Sunspot, New Mexico
  • 2. Advanced Technology Solar Telescope 2
  • 3. Adaptive Optics system Uncorrected Tip/Tilt light Mirror Deformable Mirror (DM) Tilt drive signal DM drive signal Corrected Processors Beamsplitter light Shack-Hartmann Lenslet Array CCD Camera 3
  • 5. HOAO Real-time system Actuator gains Offscale Recon- Dark Reference slope Slope struction Actuator Flat field image field tolerance offsets matrix offsets Deformable mirror Cross- Offscale WFS correlation Matrix Actuator Camera X slope slope detection X multiply servos Servo computation parameters Average Tip/Tilt slope servos Tip/Tilt mirror Data Zernike collection offload process • 1750 sub-apertures and 1900 actuators 5
  • 6. Camera data format camera data half camera data half 960 x 480 pixels 960 x 480 pixels • Camera data consists of two halves of 960x480 pixels • Each half of camera data sent to FPGA using 12 channels 6
  • 7. Scenario 1:FPGA-DSP 96 DSPs Camera FPGA 1 data half 12 optical 12 fiber channels channels Camera FPGA 2 data half 12 optical 12 fiber channels channels • Pixel unpacking task - FPGA • Processing - DSPs 7
  • 8. Scenario 2:FPGA-DSP 48 DSPs Camera FPGA 1 data half 12 optical 12 fiber channels channels Camera FPGA 2 data half 12 12 optical fiber channels channels • Pixel unpacking, dark and flat correction- FPGA • Cross-correlation and reconstruction matrix processing - DSPs 8
  • 9. Dark and flat correction pixel0 10 • Dark pixel and flat pixel stored in - 10 RAM dark_pixel 8 8 x 18 flat_product0 • Flat corrected product is flat_pixel 8 accumulator 8 concatenated and written to flat_acc1 pixel 1 10 FIFO - 10 • Flat accumulated value can be used to update the reference dark_pixel 8 flat_pixel 8 x 8 18 flat_product1 image 8 accumulator flat_acc1 pixel16 10 - 10 dark_pixel 8 flat_pixel 8 x 8 18 flat_product16 8 accumulator flat_acc16 9
  • 10. Pixel unpacking & Dark and flat correction Synchronizer/ counters dark and flat reference image value RAM RAM 206.8 ns 20 ns 256 channel 1 128 Data 160 Dark-flat correction/ Receiver FIFO unpack accumulator 16 160 288 channel 2 PCIe system bus 128 Data 160 Dark-flat correction/ 12 channels Receiver FIFO 1/2 camera unpack accumulator 16 160 288 channel 12 128 Data 160 Dark-flat correction/ Receiver FIFO unpack accumulator 16 160 288 clock period = 9.42 ns clock period = 5 ns clock rate = 106.15 MHz clock rate = 200 MHz 10
  • 11. Scenario 3:FPGA-GPU or FPGA-CPU Camera FPGA 1 data half 12 optical fiber PCI-e bus channels GPU/CPU Camera FPGA 2 data half 12 optical fiber channels • Pixel unpacking, dark and flat correction - FPGA • Cross-correlation and reconstruction matrix processing - GPU or CPU 11
  • 12. Nvidia Tesla C2050 GPU Multiprocessor 14 • Nvidia Tesla C2050: 14 streaming multi-processors Multiprocessor 2 with 32 cores each (SIMD) Multiprocessor 1 Instruction Cache clocked at 1.15 GHz Warp Scheduler Warp Scheduler • 3 GB on-board RAM Dispatch Unit Dispatch Unit • Kernel-based execution Register File • 1.288 TFLOPS single Core 1 Core 2 Core 1 Core 2 Load/ Store 1 SFU 1 precision Load/ SFU 2 Core 3 Core 4 Core 3 Core 4 Store 2 • 515.2 GFLOPS double SFU 3 Load/ precision Core 15 Core 16 Core 15 Core 16 SFU 4 Store 16 Interconnection Network 64 KB Shared Memory/ L1 cache Uniform Cache Reference: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf 12
  • 13. Process mapping and partitioning Raw Flat Reference pixels pixels pixels 20x20 20x20 20x20 FPGA GPU Dark find x and y dark flat 2D cross-correlation pixels maximum interpolation correction correction 20x20 13
  • 14. Correlation routines 1. FFT correlation 2. 7x7 correlation flat reference corrected image image precomputed original reference Region 1 reference FFT FFT image 26x26 pixels (20x20 pixels) precomputed Region 2 reference Complex conjugate (20x20 pixels) Multiplication IFFT precomputed Region 49 reference (20x20 pixels) Precomputed Reference pixels 20x20 (49 regions) 14
  • 15. find_max and interpolation routines • Find the maximum value and itʼs index • Find x and y shifts using the interpolation equations num x = max value − out(shif ted y index, (shif ted x index − 1) den x = 2 ∗ max value − out(shif ted y index, (shif ted x index − 1)) −out(shif ted y index, (shif ted x index + 1)) num x x = (shif ted x index − 0.5) + den x num y = max value − out((shif ted y index − 1), shif ted x index) den y = 2 ∗ max value − out((shif ted y index − 1), shif ted x index) −out((shif ted y index + 1), shif ted x index)) num y y = (shif ted y index − 0.5) + den y 15
  • 16. GPU results Tesla C1060 FFT correlation Tesla C2050 7x7 correlation 2200 400 1889 313 307 301 1619 278 279 281 1650 1510 300 Time in us Time in us 1188 1100 200 550 100 0 0 1 50 1 50 584 No. of images No. of images Note: Least time indicates better performance 16
  • 17. Reconstruction routine 1900 Tesla C1060 x y Tesla C2050 1750 1750 x DSP CPU x and y shifts for 1750 sub-aperture images 3500 100000 46769 reconstruction matrix 1900x3500 10000 964 956 Time in us 1900 1000 229 accumulated values for 1900 actuators 100 10 • 1750 sub-aperture x and y shifts • 3500 x 1900 reconstruction matrix 1 Devices 17
  • 18. Scenario 4:FPGA-GPU or FPGA-CPU Camera FPGA 1 data half 12 optical fiber PCI-e bus channels GPU/CPU Camera FPGA 2 data half 12 optical fiber channels • Pixel unpacking, dark and flat correction, cross-correlation - FPGA • Reconstruction matrix processing - GPU or CPU 18
  • 19. Cross-correlation 18 • Configure 400x392 (49x8 bits/ flat_product0 pixel) RAM bank (RAM0-RAM19) 18 8 x 26 xcorr_product0 with pre-computed reference flatcorr_value pixels ref_pixel0 392 • Multiply each pixel with 18 ref_pixel corresponding reference pixel flat_product0 8 x 26 xcorr_product1 1274 xcorr_value_per pixel ref_pixel1 18 flat_product0 8 x 26 xcorr_product48 ref_pixel48 19
  • 20. Sub-aperture format Channel # Channel # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 0 1 3 8 3 8 3 8 3 8 2 7 2 7 2 7 2 7 1 6 1 6 1 6 1 6 0 5 0 5 0 4 0 4 • Sub-aperture regions in 480 columns x 2 2 3 2 2 3 13 18 13 18 13 18 13 18 12 17 12 17 12 16 12 16 11 15 11 15 10 15 10 15 9 14 9 14 9 14 9 14 1 row per channel 4 4 23 23 22 22 21 21 21 21 20 20 20 20 19 19 19 19 0 0 4 4 4 4 3 3 2 2 1 1 1 1 0 0 0 0 • Accumulate pixels per sub-aperture in 3 4 1 2 3 4 1 2 9 13 9 13 8 13 8 13 7 12 7 12 7 12 7 12 6 11 6 11 6 11 6 11 5 10 5 10 5 10 5 10 each channel 3 3 18 18 18 18 17 17 17 17 16 16 16 16 15 15 14 14 1274 1715 4 4 23 23 23 23 22 22 22 22 21 21 20 20 19 19 19 19 xcorr_pixel0 subap0_acc 1274 1715 0 0 4 4 4 4 3 3 3 3 2 2 2 2 1 1 0 0 xcorr_pixel1 subap1_acc subap_accumulator 5 1 5 1 9 9 9 9 8 8 8 8 7 7 6 6 5 5 5 5 channel #1,#2,#7,#8 6 2 6 2 14 14 14 14 13 13 12 12 11 11 11 11 10 10 10 10 3 3 1274 1715 19 19 18 18 17 17 17 17 16 16 16 16 15 15 15 15 xcorr_pixel15 subap23_acc 4 4 23 23 23 23 22 22 22 22 21 21 21 21 20 20 20 20 0 0 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 1274 1715 7 1 7 1 8 8 8 8 7 7 7 7 6 6 6 6 5 5 4 4 xcorr_pixel0 subap0_acc 8 2 8 2 13 13 13 13 12 12 12 12 11 11 10 10 9 9 9 9 1274 1715 3 3 18 18 18 18 17 17 16 16 15 15 15 15 14 14 14 14 xcorr_pixel1 subap1_acc subap_accumulator 4 4 23 23 22 22 21 21 21 21 20 20 20 20 19 19 19 19 channel #3,#4,#9,#10 0 0 4 4 4 4 3 3 2 2 1 1 1 1 0 0 0 0 1274 1715 xcorr_pixel15 subap23_acc 9 1 9 1 9 9 8 8 7 7 7 7 6 6 6 6 5 5 5 5 10 2 10 2 13 13 13 13 12 12 12 12 11 11 11 11 10 10 10 10 3 3 18 18 18 18 17 17 17 17 16 16 16 16 15 15 14 14 4 4 23 23 23 23 22 22 22 22 21 21 20 20 19 19 19 19 1274 1715 xcorr_pixel0 subap0_acc 0 0 4 4 4 4 3 3 3 3 2 2 2 2 1 1 0 0 1274 1715 11 1 11 1 9 9 9 9 8 8 8 8 7 7 6 6 5 5 5 5 xcorr_pixel1 subap1_acc subap_accumulator 12 2 12 2 14 14 14 14 13 13 12 12 11 11 11 11 10 10 10 10 channel #5,#6,#11,#12 3 3 19 19 18 18 17 17 17 17 16 16 16 16 15 15 15 15 4 4 23 23 23 23 22 22 22 22 21 21 21 21 20 20 20 20 1274 1715 xcorr_pixel15 subap23_acc 20
  • 21. Timing Rxdata from transceiver unpacked data 123.73 ns written to FIFO 40 ns unpacked data read 95 ns from FIFO 15 ns dark-flat output 40 ns input to xcorr_pixel module 20 ns output from xcorr_pixel 16 ns output from sub-aperture accumulator per channel 91 ns • Each data packet is available from the FIFO after 95 ns • 95 ns * 5 packets * 10 rows = 4.75 us to read the data from the FIFO 21
  • 22. Timing with find_max & interpolation sub-aperture 2*max_value accumulated max_value 36 value 35 subap value 1715 find_max max_index max_index - 1 x_shift1 x 1715 6 max_index + 1 x_shift2 35 32 index index x and y calculation max_index - 7 decoder y_shift1 35 shift calc y max_index + 7 y_shift2 35 32 35 shifted_x - 0.5 shifted index 32 calculation shifted_y - 0.5 32 5.84 us output from sub- aperture accumulation across 6 channels 0.060 us 0.060 us xy shifts 22
  • 23. GPU vs FPGA vs DSP 100 us 225 us 300.93 us Camera readout Data transfer through PCIe x16 C2050 GPU 1 C2050 GPU 2 C2050 GPU 3 C2050 GPU throughput = 525.93 us FPGA FPGA throughput = 280 us DSP 96 DSPs throughput = 495 us Camera readout Data transfer through PCIe x16 C2050 GPU 1 23
  • 24. Conclusions GPU FPGA • DSP: excellent performance but not cost-effective • GPU: fast SIMD architectures - suitable for certain tasks • FPGA: MIMD architectures, custom I/O, meets latency and throughput constraints Slide idea: David Pellerin, Impulse Accelerated Technology 24
  • 25. Future work Virtex-6 Virtex-7 Resources XC6VLX550T XC7V2000T Slice logic resources 549,888 1,954,560 I/O pins 840 850 GTX transceivers 36 36 • Investigate performance improvement after partitioning 3 channels between FPGAs. Total of 5 FPGAs for processing each half of camera data • Throughput sustained even if the processes are partitioned over multiple FPGAs • Promising because of increased logic density in Virtex-7 FPGAs 25
  • 26. Discussion Questions Email: vivek@vivekvenugopal.net 26
  • 27. Top level design channel_cycle_count 288 288 160 subap_row_count refim_fetch_addr_d RAM bank (RAM0- FCFPGA dark_flat_acc_top Flatcorr xcorr_pixel_channel ch1278_subap_accumulator ecoder RAM19) _FIFO addr_decoder_ce subap_acc_out (1715 bits) x24 address decoder data unpack xcorr_pixel refim_in (1274 bits) x16 xcorr_sm xcorr_pixel_ce (392 bits) x16 subap_acc_ce channel1_top subap_acc_12ch_ce xcorr state flat_fifo_rd machine subap_acc_out 24subap_12ch_ (1715 bits) x24 accumulator 288 288 160 FCFPGA dark_flat_acc_top Flatcorr xcorr_pixel_channel ch561112_subap_accumulator _FIFO subap_acc_out xcorr_pixel (1715 bits) x24 data unpack refim_in (1274 bits) x16 (392 bits) x16 channel12_top 27
  • 28. Synthesis estimates for Virtex-6 FPGA • Implement dark, flat correction only : resources used 288 out of 687,360 (1%) • Implement the correlation for single channel up to the sub-aperture accumulator within the channel (without the final 12 channel accumulation) : resources used 2,578 out of 687,360 (1%) Device utilization summary: Slice Logic Utilization: Number of Slice Registers: 992448 out of 687360 144% (*) Number of Slice LUTs: 1126081 out of 343680 327% (*) Number used as Logic: 1125853 out of 343680 327% (*) Number used as Memory: 228 out of 99200 Number used as SRL: 37 28
  • 29. Virtex-6 FPGA Board /-5 #78$")*'+,-.+ 01Y57 _/]^d^ 01Y57] 90':0& 01Y570 /01 #2. % !"#$%&&'($")*'+,-.+ %&)-L !"#$%&&'($")*'+,-.+ %&)-L !"#$%&&'($")*'+,-.+ 23 %$V%$$V%$$$ 1@S /.++0/1.&$23445*-+6 0OJA@ /.++0/1.&$23445*-+6 0OJA@ /.++0/1.&$23445*-+6 /014 Ded&*7 9&*7-`c? ) #2. !" !" !" !" !" !" 01Y575 /565788 01Y57F 9:;<=>;? # # _/]^dD #2. #2. 01Y57. !"#$ !"#$ !"#$ ;;<=$>?;@!! # #2. # #2. ;;<=$>?;@!! 23#A$!')6 ! D C 23#A$!')6 *) "$ 9@2AB? %+$ %%#X&)*7-`c %+$ %&'()*+, %&'()*+, %&'()*+, >?@A&B<!"#$ &# -./0123-.4,52 *) -./0123-.4,52 "$ -./0123-.4,52 -`c 03;RM;H>S 6.4752 6.4752 6.4752 /SHB@;A=c;3 Y$ &$ #$ -.551236.0852 -.551236.0852 -.551236.0852 9/=*+&"? 9&e`c 9!!785:; 9!!785:; 9!!785:; B27'$$7-`c? &$ #$ %%#X&)*7-`c #$ #2. ) # #2. #2. &# -`c 03;RM;H>S !"#$<= #$ #$ /SHB@;A=c;3 Y% %&'()*+5 #$ %& %" 9/=*+&"? 9&e`c >?@A&B<!"#$ #$ #$ #$ '#( '# !* !* &$ &$ '+ '+ #2. #2. B2'$$7-`c? #$ %& ,-. %%#X&)*7-`c ">F) # #2. # #2. &# ;;<=$>?;@!! -`c 03;RM;H>S !"#$ !"#$ !"#$ ;;<=$>?;@!! 23#A$!')6 /SHB@;A=c;3 Y& # #2. # #2. > E $ 23#A$!')6 9/=*+&"? 9&e`c %&$ %$$ %+$ %+$ B27'$$7-`c? %&'()*+, %&'()*+, %&'()*+, -./0123-.4,52 -./0123-.4,52 -./0123-.4,52 %&$ %$$ %&*7-`c 6.4752 6.4752 6.4752 %*$7-`c -.551236.0852 &$ -.551236.0852 #$ -.551236.0852 W/ &*$7-`c 9!!785:; 9!!785:; 9!!785:; +%&X*7-`c -Y67O2>U &$ #$ Y$ #$ ) ) #2. -117.MA #2. ">F) W/ #$ 9#)@7; 0<+<GH@)I #78$")*'+,-.+ %&)-L #78$")*'+,-.+ %&)-L 90':0& 0D5/` 90':0& 0OJA@ %$V%$$V%$$$ %$V%$$V%$$$ ^Y-88 LJA;6 1@S ^b#* JH'K)GG<J%8L/11 "# BCD!$)$E3 ;;<C C7DEF/7G@;H7IJ=3;:K7LMB7>JH7L;73MH ^/&+& 01_ 01_ 777A=HNO;P;H:;:7JB73;:M>;:7Q3;RM;H>S F-59#[? + %&)-L _/.7&X$ 1_ 1_ _/. /18 9+[? .22B70D5/` /565 18; #2. PPT71J>U;B78VW763JHA>;=<;3A #POJH;A ) &*"-L 77777"X*7YLVA7I;37>@JHH;O /565788 ^6 18;79Y],%? ,5,F70D5/` 77777L=:=3;>B=2HJO79%+7YLVA7ZJ[? 9@2AB? .22B &[ 187]a1^]// DiniFPGA board CM%,!,">F) !"#$%&'()*+), -./.0 29