SlideShare une entreprise Scribd logo
1  sur  17
AMBER 12
    GPU Support Revision 12.1
          12/21/2012




1                       AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Benefits of GPU AMBER Accelerated Computing
     Faster than CPU only systems in all tests

     Most major compute intensive aspects of classical MD ported

     Large performance boost with marginal price increase

     Energy usage cut by more than half

     GPUs scale well within a node and over multiple nodes

     K20 GPU is our fastest and lowest power high performance GPU yet

       Try GPU accelerated AMBER for free – www.nvidia.com/GPUTestDrive
2                                                        AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Protein Folding Simulation With AMBER
                     Accelerated By GPUs
                                                4.7x Faster




                             80 ns/day            367 ns/day

                             4 core CPU   4 core CPU + Tesla M2070 GPU

Data courtesy of AMBER.org
Kepler - Our Fastest Family of GPUs Yet
                        30.00
                                                        Factor IX                                                   Running AMBER 12 GPU Support Revision 12.1
                                                                                                      25.39
                        25.00                                                                                       The blue node contains Dual E5-2687W CPUs
                                                                                    22.44                           (8 Cores per CPU).
                                                                                                     7.4x           The green nodes contain Dual E5-2687W CPUs (8
                        20.00                                    18.90                                              Cores per CPU) and either 1x NVIDIA M2090, 1x K10
    Nanoseconds / Day




                                                                                                                    or 1x K20 for the GPU
                                                                                   6.6x
                        15.00

                                                11.85           5.6x
                        10.00


                                               3.5x
                         5.00
                                   3.42



                         0.00
                                                                                                                                               Factor IX
                                1 CPU Node   1 CPU Node +   1 CPU Node + K10   1 CPU Node + K20 1 CPU Node + K20X
                                                M2090


                        GPU speedup/throughput increased from 3.5x (with M2090) to 7.4x (with K20X)
                        when compared to a CPU only node
4                                                                                                                   AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
K10 Accelerates Simulations of All Sizes
                               30
                                                                                                                               Running AMBER 12 GPU Support Revision 12.1

                                                                                                                               The blue node contains Dual E5-2687W CPUs
                               25                                                                                   24.00      (8 Cores per CPU).
Speedup Compared to CPU Only




                                                                                                                               The green nodes contain Dual E5-2687W CPUs (8
                                                                                                        19.98
                               20                                                                                              Cores per CPU) and 1x NVIDIA K10 GPU



                               15



                               10


                                                               5.50         5.53          5.04
                               5
                                                     2.00

                               0
                                         CPU        TRPcage   JAC NVE   Factor IX NVE Cellulose NVE   Myoglobin   Nucleosome
                                    All Molecules     GB        PME         PME            PME          GB            GB


                                             Gain 24x performance by adding just 1 GPU
                                                                                                                                           Nucleosome
                                             when compared to dual CPU performance
Run AMBER 28x Faster With Tesla K20 GPUs
                                   30.00
                                                                                                                            28.00
                                                                                                                                           Running AMBER 12 GPU Support Revision 12.1
                                                                                                              25.56                        SPFP with CUDA 4.2.9 ECC Off
                                   25.00
                                                                                                                                           The blue node contains 2x Intel E5-2687W CPUs
    Speedup Compared to CPU Only




                                                                                                                                           (8 Cores per CPU)
                                   20.00
                                                                                                                                           Each green nodes contains 2x Intel E5-2687W
                                                                                                                                           CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs

                                   15.00



                                   10.00
                                                                                                  7.28
                                                                        6.50         6.56

                                    5.00
                                                           2.66
                                              1.00
                                    0.00
                                             CPU All    TRPcage GB JAC NVE PME Factor IX NVE Cellulose NVE Myoglobin GB   Nucleosome
                                            Molecules                              PME            PME                         GB


                                           Gain 28x throughput/performance by adding just one K20 GPU
                                                                                                                                                              Nucleosome
                                           when compared to dual CPU performance

6                                                                                                                                      AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
K20X Accelerates Simulations of All Sizes
                                   35
                                                                                                                        31.30         Running AMBER 12 GPU Support Revision 12.1
                                   30                                                                       28.59
                                                                                                                                      The blue node contains Dual E5-2687W CPUs
                                                                                                                                      (8 Cores per CPU).
    Speedup Compared to CPU Only




                                   25
                                                                                                                                      The green nodes contain Dual E5-2687W CPUs (8
                                                                                                                                      Cores per CPU) and 1x NVIDIA K20X GPU
                                   20


                                   15


                                   10                                                         8.30
                                                                   7.15         7.43

                                    5
                                                         2.79


                                    0
                                             CPU        TRPcage   JAC NVE   Factor IX NVE Cellulose NVE   Myoglobin   Nucleosome
                                        All Molecules     GB        PME         PME            PME          GB            GB



                                         Gain 31x performance by adding just one K20X GPU
                                                                                                                                                          Nucleosome
                                         when compared to dual CPU performance

7                                                                                                                                  AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
K10 Strong Scaling over Nodes
                                    Cellulose 408K Atoms (NPT)                         Running AMBER 12 with CUDA 4.2 ECC Off
                    6                                                                  The blue nodes contains 2x Intel X5670
                                                                                       CPUs (6 Cores per CPU)

                    5                                                                  The green nodes contains 2x Intel X5670
                                                                                       CPUs (6 Cores per CPU) plus 2x NVIDIA
                                                                                       K10 GPUs
                    4
Nanoseconds / Day




                                                                     2.4x
                    3
                                                                            CPU Only
                                                 3.6x                       With GPU
                    2

                             5.1x
                    1

                                                                                                  Cellulose
                    0
                         1                       2               4
                                          Number of Nodes


                         GPUs significantly outperform CPUs while scaling over multiple nodes
Kepler – Universally Faster
                                9
                                                                                                  Running AMBER 12 GPU Support Revision 12.1
                                8                                                                 The CPU Only node contains Dual E5-2687W CPUs
                                                                                                  (8 Cores per CPU).
Speedups Compared to CPU Only




                                7
                                                                                                  The Kepler nodes contain Dual E5-2687W CPUs (8
                                6                                                                 Cores per CPU) and 1x NVIDIA K10, K20, or K20X
                                                                                                  GPUs
                                5
                                                                                      JAC

                                4                                                     Factor IX
                                                                                      Cellulose
                                3

                                2

                                1

                                0
                                    CPU Only    CPU + K10    CPU + K20   CPU + K20X                             Cellulose


                                    The Kepler GPUs accelerated all simulations, up to 8x
K10 Extreme Performance
                                                                        Running AMBER 12 GPU Support Revision 12.1
                                      JAC 23K Atoms (NVE)
                    120                                                 The blue node contains Dual E5-2687W CPUs
                                                                        (8 Cores per CPU).

                                                               97.99    The green node contain Dual E5-2687W CPUs (8
                    100
                                                                        Cores per CPU) and 2x NVIDIA K10 GPUs
Nanoseconds / Day




                     80


                     60


                     40


                     20
                                   12.47


                      0
                                   1 Node                     1 Node
                                                                                        DHFR

                          Gain 7.8X performance by adding just 2 GPUs
                            when compared to dual CPU performance
K20 Extreme Performance
                                            DHRF JAC 23K Atoms (NVE)                          Running AMBER 12 GPU Support Revision 12.1
                                                                                              SPFP with CUDA 4.2.9 ECC Off
                         120

                                                                                              The blue node contains 2x Intel E5-2687W CPUs
                                                                          95.59               (8 Cores per CPU)
                         100

                                                                                              Each green node contains 2x Intel E5-2687W
                                                                                              CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU
     Nanoseconds / Day




                          80


                          60


                          40


                          20                 12.47


                           0
                                            1 Node                       1 Node
                                                                                                                    DHFR

                               Gain > 7.5X throughput/performance by adding just 2 K20 GPUs
                                          when compared to dual CPU performance

11                                                                                        AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Replace 8 Nodes with 1 K20 GPU
     90.00                                                                 35000
                                               $32,000.00
                                                                                       Running AMBER 12 GPU Support Revision 12.1
                                 81.09                                                 SPFP with CUDA 4.2.9 ECC Off
     80.00
                                                                           30000
                                                                                       The eight (8) blue nodes each contain 2x Intel
     70.00                                                                             E5-2687W CPUs (8 Cores, $2000 per CPU)
                     65.00
                                                                           25000
                                                                                       Each green node contains 2x Intel E5-2687W
     60.00
                                                                                       CPUs (8 Cores per CPU) plus 1x NVIDIA K20
                                                                                       GPU ($2500 per GPU)
     50.00                                                                 20000


     40.00                                                                 15000

     30.00
                                                                           10000
     20.00                                                     $6,500.00

                                                                           5000
     10.00


      0.00                                                                 0
                      Nanoseconds/Day                   Cost

                                                                                                           DHFR
             Cut down simulation costs to ¼ and gain higher performance

12                                                                                 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Replace 7 Nodes with 1 K10 GPU
                          Performance on JAC NVE                       Cost                     Running AMBER 12 GPU Support Revision 12.1
                                                                                                SPFP with CUDA 4.2.9 ECC Off
                         80                               35000
                                                                                                The eight (8) blue nodes each contain 2x Intel
                         70                               30000                                 E5-2687W CPUs (8 Cores, $2000 per CPU)

                         60
                                                                                                The green node contains 2x Intel E5-2687W
                                                          25000                                 CPUs (8 Cores per CPU) plus 1x NVIDIA K10
     Nanoseconds / Day




                                                                                                GPU ($3000 per GPU)
                         50
                                                          20000
                         40
                                                          15000
                         30

                                                          10000
                         20


                         10                                5000


                          0                                   0
                                CPU Only    GPU Enabled           CPU Only    GPU Enabled

                                                                                                                    DHFR
                         Cut down simulation costs to ¼ and increase performance by 70%

13                                                                                          AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Extra CPUs decrease Performance
                                   Cellulose NVE                               Running AMBER 12 GPU Support Revision 12.1

                    8                                                          The orange bars contains one E5-2687W CPUs
                                                                               (8 Cores per CPU).
                    7
                                                                               The blue bars contain Dual E5-2687W CPUs (8
                    6                                                          Cores per CPU)
Nanoseconds / Day




                    5

                    4                                             1 E5-2687W
                                                                  2 E5-2687W
                    3

                    2

                    1

                    0                                                                        Cellulose
                        CPU Only             CPU with dual K20s




When used with GPUs, dual CPU sockets perform worse than single CPU sockets.
Kepler - Greener Science
                                                                                               Running AMBER 12 GPU Support Revision 12.1
                              Energy used in simulating 1 ns of DHFR JAC
                       2500                                                                    The blue node contains Dual E5-2687W CPUs
                                                                                               (150W each, 8 Cores per CPU).

                                                                                               The green nodes contain Dual E5-2687W CPUs (8
                       2000                                                                    Cores per CPU) and 1x NVIDIA K10, K20, or K20X
                                                                Lower is better                GPUs (235W each).
Energy Expended (kJ)




                       1500


                                                                                                        Energy Expended
                       1000
                                                                                                        = Power x Time

                        500



                          0
                               CPU Only      CPU + K10     CPU + K20              CPU + K20X


                              The GPU Accelerated systems use 65-75% less energy
Recommended GPU Node Configuration for
         AMBER Computational Chemistry
                   Workstation or Single Node Configuration
                      # of CPU sockets                                    2
                    Cores per CPU socket                  4+ (1 CPU core drives 1 GPU)
                      CPU speed (Ghz)                                 2.66+
                System memory per node (GB)                              16
                                                              Kepler K10, K20, K20X
                           GPUs
                                                           Fermi M2090, M2075, C2075
                                                                         1-2
                  # of GPUs per CPU socket                 (4 GPUs on 1 socket is good
                                                            to do 4 fast serial GPU runs)
                 GPU memory preference (GB)                               6
                   GPU to CPU connection                      PCIe 2.0 16x or higher

                       Server storage                                  2 TB

                    Network configuration                    Infiniband QDR or better

16   Scale to multiple nodes with same single node configuration     AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Benefits of GPU AMBER Accelerated Computing
     Faster than CPU only systems in all tests

     Most major compute intensive aspects of classical MD ported

     Large performance boost with marginal price increase

     Energy usage cut by more than half

     GPUs scale well within a node and over multiple nodes

     K20 GPU is our fastest and lowest power high performance GPU yet

        Try GPU accelerated AMBER for free – www.nvidia.com/GPUTestDrive
17                                                        AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012

Contenu connexe

Tendances

Pcs hd7850_sales_kit
Pcs  hd7850_sales_kitPcs  hd7850_sales_kit
Pcs hd7850_sales_kitPowerColor
 
Hd7950 sales kit
Hd7950 sales kitHd7950 sales kit
Hd7950 sales kitPowerColor
 
Powersaving with VT and IPMI
Powersaving with VT and IPMIPowersaving with VT and IPMI
Powersaving with VT and IPMIFrank Martin
 
Turbo duo hd7790 sales kit
Turbo duo hd7790 sales kitTurbo duo hd7790 sales kit
Turbo duo hd7790 sales kitPowerColor
 
Top500 List June 2012
Top500 List June 2012Top500 List June 2012
Top500 List June 2012top500
 
PowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kitPowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kitPowerColor
 
intel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceintel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceDESMOND YUEN
 
Sun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentationSun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentationxKinAnx
 
CUDA by Example : Atomics : Notes
CUDA by Example : Atomics : NotesCUDA by Example : Atomics : Notes
CUDA by Example : Atomics : NotesSubhajit Sahu
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_PlaceKohei KaiGai
 

Tendances (20)

Mateo valero p2
Mateo valero p2Mateo valero p2
Mateo valero p2
 
Ron perrot
Ron perrotRon perrot
Ron perrot
 
Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
Pcs hd7850_sales_kit
Pcs  hd7850_sales_kitPcs  hd7850_sales_kit
Pcs hd7850_sales_kit
 
Hd7950 sales kit
Hd7950 sales kitHd7950 sales kit
Hd7950 sales kit
 
Powersaving with VT and IPMI
Powersaving with VT and IPMIPowersaving with VT and IPMI
Powersaving with VT and IPMI
 
Turbo duo hd7790 sales kit
Turbo duo hd7790 sales kitTurbo duo hd7790 sales kit
Turbo duo hd7790 sales kit
 
Top500 List June 2012
Top500 List June 2012Top500 List June 2012
Top500 List June 2012
 
PowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kitPowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kit
 
Parallel Vision by GPGPU/CUDA
Parallel Vision by GPGPU/CUDAParallel Vision by GPGPU/CUDA
Parallel Vision by GPGPU/CUDA
 
Core I7
Core I7Core I7
Core I7
 
intel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceintel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performance
 
Google warehouse scale computer
Google warehouse scale computerGoogle warehouse scale computer
Google warehouse scale computer
 
54603 vsp vs300_fl5_ccah
54603 vsp vs300_fl5_ccah54603 vsp vs300_fl5_ccah
54603 vsp vs300_fl5_ccah
 
54647 01 vsp_vs300_fh4_dcaj
54647 01 vsp_vs300_fh4_dcaj54647 01 vsp_vs300_fh4_dcaj
54647 01 vsp_vs300_fh4_dcaj
 
Sun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentationSun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentation
 
54645 01 vsp_vs300_fh5_dcah
54645 01 vsp_vs300_fh5_dcah54645 01 vsp_vs300_fh5_dcah
54645 01 vsp_vs300_fh5_dcah
 
CUDA by Example : Atomics : Notes
CUDA by Example : Atomics : NotesCUDA by Example : Atomics : Notes
CUDA by Example : Atomics : Notes
 
55506 vsp vs300_av_ccaj
55506 vsp vs300_av_ccaj55506 vsp vs300_av_ccaj
55506 vsp vs300_av_ccaj
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 

Similaire à AMBER Molecular Dynamics on GPU

Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurRishi Pathak
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?Shinnosuke Furuya
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
N A G P A R I S280101
N A G P A R I S280101N A G P A R I S280101
N A G P A R I S280101John Holden
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univainside-BigData.com
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Gao Boyang
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015
Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015 Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015
Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015 NVIDIA
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPURenee Yao
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardNVIDIA
 

Similaire à AMBER Molecular Dynamics on GPU (20)

Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT Kanpur
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
Nvidia tesla-k80-overview
Nvidia tesla-k80-overviewNvidia tesla-k80-overview
Nvidia tesla-k80-overview
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
N A G P A R I S280101
N A G P A R I S280101N A G P A R I S280101
N A G P A R I S280101
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015
Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015 Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015
Visual Computing: The Road Ahead, NVIDIA CEO Jen-Hsun Huang at CES 2015
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path Forward
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
#Riverflow2 d gpu tests 2019
#Riverflow2 d gpu tests 2019#Riverflow2 d gpu tests 2019
#Riverflow2 d gpu tests 2019
 
SDC Server Sao Jose
SDC Server Sao JoseSDC Server Sao Jose
SDC Server Sao Jose
 

Dernier

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

AMBER Molecular Dynamics on GPU

  • 1. AMBER 12 GPU Support Revision 12.1 12/21/2012 1 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 2. Benefits of GPU AMBER Accelerated Computing Faster than CPU only systems in all tests Most major compute intensive aspects of classical MD ported Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated AMBER for free – www.nvidia.com/GPUTestDrive 2 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 3. Protein Folding Simulation With AMBER Accelerated By GPUs 4.7x Faster 80 ns/day 367 ns/day 4 core CPU 4 core CPU + Tesla M2070 GPU Data courtesy of AMBER.org
  • 4. Kepler - Our Fastest Family of GPUs Yet 30.00 Factor IX Running AMBER 12 GPU Support Revision 12.1 25.39 25.00 The blue node contains Dual E5-2687W CPUs 22.44 (8 Cores per CPU). 7.4x The green nodes contain Dual E5-2687W CPUs (8 20.00 18.90 Cores per CPU) and either 1x NVIDIA M2090, 1x K10 Nanoseconds / Day or 1x K20 for the GPU 6.6x 15.00 11.85 5.6x 10.00 3.5x 5.00 3.42 0.00 Factor IX 1 CPU Node 1 CPU Node + 1 CPU Node + K10 1 CPU Node + K20 1 CPU Node + K20X M2090 GPU speedup/throughput increased from 3.5x (with M2090) to 7.4x (with K20X) when compared to a CPU only node 4 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 5. K10 Accelerates Simulations of All Sizes 30 Running AMBER 12 GPU Support Revision 12.1 The blue node contains Dual E5-2687W CPUs 25 24.00 (8 Cores per CPU). Speedup Compared to CPU Only The green nodes contain Dual E5-2687W CPUs (8 19.98 20 Cores per CPU) and 1x NVIDIA K10 GPU 15 10 5.50 5.53 5.04 5 2.00 0 CPU TRPcage JAC NVE Factor IX NVE Cellulose NVE Myoglobin Nucleosome All Molecules GB PME PME PME GB GB Gain 24x performance by adding just 1 GPU Nucleosome when compared to dual CPU performance
  • 6. Run AMBER 28x Faster With Tesla K20 GPUs 30.00 28.00 Running AMBER 12 GPU Support Revision 12.1 25.56 SPFP with CUDA 4.2.9 ECC Off 25.00 The blue node contains 2x Intel E5-2687W CPUs Speedup Compared to CPU Only (8 Cores per CPU) 20.00 Each green nodes contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs 15.00 10.00 7.28 6.50 6.56 5.00 2.66 1.00 0.00 CPU All TRPcage GB JAC NVE PME Factor IX NVE Cellulose NVE Myoglobin GB Nucleosome Molecules PME PME GB Gain 28x throughput/performance by adding just one K20 GPU Nucleosome when compared to dual CPU performance 6 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 7. K20X Accelerates Simulations of All Sizes 35 31.30 Running AMBER 12 GPU Support Revision 12.1 30 28.59 The blue node contains Dual E5-2687W CPUs (8 Cores per CPU). Speedup Compared to CPU Only 25 The green nodes contain Dual E5-2687W CPUs (8 Cores per CPU) and 1x NVIDIA K20X GPU 20 15 10 8.30 7.15 7.43 5 2.79 0 CPU TRPcage JAC NVE Factor IX NVE Cellulose NVE Myoglobin Nucleosome All Molecules GB PME PME PME GB GB Gain 31x performance by adding just one K20X GPU Nucleosome when compared to dual CPU performance 7 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 8. K10 Strong Scaling over Nodes Cellulose 408K Atoms (NPT) Running AMBER 12 with CUDA 4.2 ECC Off 6 The blue nodes contains 2x Intel X5670 CPUs (6 Cores per CPU) 5 The green nodes contains 2x Intel X5670 CPUs (6 Cores per CPU) plus 2x NVIDIA K10 GPUs 4 Nanoseconds / Day 2.4x 3 CPU Only 3.6x With GPU 2 5.1x 1 Cellulose 0 1 2 4 Number of Nodes GPUs significantly outperform CPUs while scaling over multiple nodes
  • 9. Kepler – Universally Faster 9 Running AMBER 12 GPU Support Revision 12.1 8 The CPU Only node contains Dual E5-2687W CPUs (8 Cores per CPU). Speedups Compared to CPU Only 7 The Kepler nodes contain Dual E5-2687W CPUs (8 6 Cores per CPU) and 1x NVIDIA K10, K20, or K20X GPUs 5 JAC 4 Factor IX Cellulose 3 2 1 0 CPU Only CPU + K10 CPU + K20 CPU + K20X Cellulose The Kepler GPUs accelerated all simulations, up to 8x
  • 10. K10 Extreme Performance Running AMBER 12 GPU Support Revision 12.1 JAC 23K Atoms (NVE) 120 The blue node contains Dual E5-2687W CPUs (8 Cores per CPU). 97.99 The green node contain Dual E5-2687W CPUs (8 100 Cores per CPU) and 2x NVIDIA K10 GPUs Nanoseconds / Day 80 60 40 20 12.47 0 1 Node 1 Node DHFR Gain 7.8X performance by adding just 2 GPUs when compared to dual CPU performance
  • 11. K20 Extreme Performance DHRF JAC 23K Atoms (NVE) Running AMBER 12 GPU Support Revision 12.1 SPFP with CUDA 4.2.9 ECC Off 120 The blue node contains 2x Intel E5-2687W CPUs 95.59 (8 Cores per CPU) 100 Each green node contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU Nanoseconds / Day 80 60 40 20 12.47 0 1 Node 1 Node DHFR Gain > 7.5X throughput/performance by adding just 2 K20 GPUs when compared to dual CPU performance 11 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 12. Replace 8 Nodes with 1 K20 GPU 90.00 35000 $32,000.00 Running AMBER 12 GPU Support Revision 12.1 81.09 SPFP with CUDA 4.2.9 ECC Off 80.00 30000 The eight (8) blue nodes each contain 2x Intel 70.00 E5-2687W CPUs (8 Cores, $2000 per CPU) 65.00 25000 Each green node contains 2x Intel E5-2687W 60.00 CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPU ($2500 per GPU) 50.00 20000 40.00 15000 30.00 10000 20.00 $6,500.00 5000 10.00 0.00 0 Nanoseconds/Day Cost DHFR Cut down simulation costs to ¼ and gain higher performance 12 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 13. Replace 7 Nodes with 1 K10 GPU Performance on JAC NVE Cost Running AMBER 12 GPU Support Revision 12.1 SPFP with CUDA 4.2.9 ECC Off 80 35000 The eight (8) blue nodes each contain 2x Intel 70 30000 E5-2687W CPUs (8 Cores, $2000 per CPU) 60 The green node contains 2x Intel E5-2687W 25000 CPUs (8 Cores per CPU) plus 1x NVIDIA K10 Nanoseconds / Day GPU ($3000 per GPU) 50 20000 40 15000 30 10000 20 10 5000 0 0 CPU Only GPU Enabled CPU Only GPU Enabled DHFR Cut down simulation costs to ¼ and increase performance by 70% 13 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 14. Extra CPUs decrease Performance Cellulose NVE Running AMBER 12 GPU Support Revision 12.1 8 The orange bars contains one E5-2687W CPUs (8 Cores per CPU). 7 The blue bars contain Dual E5-2687W CPUs (8 6 Cores per CPU) Nanoseconds / Day 5 4 1 E5-2687W 2 E5-2687W 3 2 1 0 Cellulose CPU Only CPU with dual K20s When used with GPUs, dual CPU sockets perform worse than single CPU sockets.
  • 15. Kepler - Greener Science Running AMBER 12 GPU Support Revision 12.1 Energy used in simulating 1 ns of DHFR JAC 2500 The blue node contains Dual E5-2687W CPUs (150W each, 8 Cores per CPU). The green nodes contain Dual E5-2687W CPUs (8 2000 Cores per CPU) and 1x NVIDIA K10, K20, or K20X Lower is better GPUs (235W each). Energy Expended (kJ) 1500 Energy Expended 1000 = Power x Time 500 0 CPU Only CPU + K10 CPU + K20 CPU + K20X The GPU Accelerated systems use 65-75% less energy
  • 16. Recommended GPU Node Configuration for AMBER Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 4+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 1-2 # of GPUs per CPU socket (4 GPUs on 1 socket is good to do 4 fast serial GPU runs) GPU memory preference (GB) 6 GPU to CPU connection PCIe 2.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better 16 Scale to multiple nodes with same single node configuration AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 17. Benefits of GPU AMBER Accelerated Computing Faster than CPU only systems in all tests Most major compute intensive aspects of classical MD ported Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated AMBER for free – www.nvidia.com/GPUTestDrive 17 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012

Notes de l'éditeur

  1. ns/dayDual E5-2687W CPUs 3.4Dual E5-2687W CPUs + M2090 11.9Dual E5-2687W CPUs + K10 18.9Dual E5-2687W CPUs + K20 22.4Dual E5-2687W CPUs + K20X 25.39
  2. cpu ns/day gpu ns/dayTrpcage 210 420Jacnve 12.47 68.6Factor 9 3.42 18.9Cellulose .74 3.73Myoglobin 6.12 122.3Nucleosome .1 2.4
  3. cpu ns/day gpu ns/dayTRPcage GB 210.32 559.32JAC NVE PME 12.47 81.09Factor IX NVE PME 3.42 22.44Cellulose NVE PME 0.74 5.39Myoglobin GB 6.12 156.45Nucleosome GB 0.10 2.80SPFP ECC off
  4. cpu ns/day gpu ns/dayTrpcage 210 585Jacnve 12.47 89.13Factor 9 3.42 25.4Cellulose .74 6.14Myoglobin 6.12 175.77Nucleosome .1 3.13
  5. Nodes cpu ns/day gpu ns/day1 .65 3.312 1.14 4.134 2.01 4.8
  6. Energytdpsec/nsenergy (kJ)2e5-2687 300 6928 207822687+K10 5351259673 22687+K20 535 1065 569 22687+K20X 535 969 518
  7. 1 CPU node (dual CPUs) = 12.47 ns/day1 CPU+ GPU node (dual CPUs and GPUs) = 95.59 ns/day
  8. Costs:CPU: 8 nodes * 2 CPU/node * $2000/cpuGPU: 2 CPUs + 1 GPUPerformance:CPU: 40.44 ns/day from gordon supercomputerGPU: 68.6 ns/day
  9. Perflab:no gpuk10k20k20x2k102k202k20xcell 1 cpu0.374.445.46.166.376.937.67cell 2 cpu0.744.345.396.146.46.787.5
  10. Energytdpsec/nsenergy (kJ)2e5-2687 300 6928 207822687+K10 5351259673 22687+K20 535 1065 569 22687+K20X 535 969 518