SlideShare une entreprise Scribd logo
1  sur  41
HPC milestones
Michal Klimeš
Experts @ H P C
 Structural Mechanics      Structural Mechanics          Computational Fluid     Electro-Magnetics
        Implicit                  Explicit                   Dynamics




Computational Chemistry   Computational Chemistry        Computational Biology   Seismic Processing
  Quantum Mechanics         Molecular Dynamics




  Reservoir Simulation     Rendering / Ray Tracing        Climate / Weather        Data Analytics
                                                          Ocean Simulation




                                                     2
C o m p e t e n c y = Real HPC + Big
Storage
From TOP500
There are no small things
OpenFOAM® Performance with SGI MPI
   Speedup                         Performance: SGI MPT < --> OpenMPI                      Ratio MPT / OpenMPI

      3,0                                                                                           2,0
              Automotive Interior Climate                                        2,72

      2,5     Model, 19M cells                                  2,29

                                                                       2,01             2,01
      2,0
                                        1
                                        ,73
                                              1
                                              ,59
      1
      ,5                                                                                            1
                                                                                                    ,5
                                                                                     1
                                                                                     ,35
                  1
                  ,02        1
                             ,00
      1
      ,0

                                                                   14
                                                                   ,1
      0,5                                  1
                                           ,08
                      1
                      ,02

      0,0                                                                                           1
                                                                                                    ,0
                        64                 1
                                           28                      1
                                                                   92                256
        # Cores

             SGI MPT Speedup                  OpenMPI Speedup                 MPT/OpenMPI Ratio



                OpenFOAM with SGI MPI with up to 35% better
                              performance
                                  6
W h a t is t h e “ a v e r a g e ”                                                           SGI Confidential


   p o w e r c o n s u m p t io n ?
                        Linpack 30.5kW*
                                                            STREAM             GUPS                     Fluent
                                                              22.1kW*    23.3kW*                     22.4kW*
                                                               72.5%      76.4%                       73.4%
                                                            Linpack kW Linpack kW                  Linpack kW


                        Idle
                   15.9kW*
               52.1% Linpack kW

      Average power consumption heavily depends on
            • application and its data profile
            • the level of code optimization (+ libraries + MPI optimization)
            • the ability of Job Scheduler to utilize the system
            • the bottlenecks in I/O subsystem and in OS
* Measured on ICE 8200 system with 128x 2.66GHz Quad-Core Intel® Xeon® Processor 5300 series (1 Rack)
Where is performance?




                        Accelerated
R e a l M e m o r y B a n d w id t h
R e q u ir e m e n t s
M e a s u re me nts a t L R Z o n
S G I A lt ix 4 7 0 0




                                                       Source: Matthias Brehm (LRZ) in inSide, vol 4 No 2
                                               p   s
                                      1   F lo
                               /s :
                            1B
SGI HPC Servers and Supercomputers




                                Scale-Out                                              Scale-Up




    Rackable™                  CloudRack™                   Altix ® ICE                Altix® UV
  1U, 2U, 3U, 4U & XE             Tray Cluster              Blade Cluster              Shared-Memory
  Build-to-Order Leader   Architecture (for Internet DC)     Architecture                 Architecture
                                                           Scalability Leader Virtualization & May-Core Leader
S G I U V2
4th Generation SMP System
        • T h e m o s t f le x ib le
          s ys te m !
SGI UV Shared Memory Architecture

    C o m m o d it y C lu s t e r s                             S G I U V P la t f o r m
    In f in iB a n d o r G ig a b it E t h e r n e t          S G I N U M A lin k In t e r c o n n e c t
  Mem
       me m       me m     me m                 me m        G l o b a l s h a r e d m e m o r y t o 16 T B
~ 64GB
                                                                            S ys te m
s ys te m ys te m ys te m ys te m
     +
         s
             +
                 s
                     +
                         s
                             +        ...      s ys te m
                                                    +                           +
   OS      OS      OS      OS                     OS                           OS

 • E a c h s ys te m h a s o w n                           • A ll n o d e s o p e r a t e o n o n e
   me mo ry a nd O S                                         la r g e s h a r e d m e m o r y s p a c e
 • N o d e s c o m m u n ic a t e o v e r                  • E lim in a t e s d a t a p a s s in g
   c o m m o d it y in t e r c o n n e c t                   b e tw e e n no d e s
 • In e f f ic ie n t c r o s s -n o d e                   • B ig d a t a s e t s f it e n t ir e ly in
   c o m m u n ic a t io n c r e a t e s                     me mo ry
   b o t t le n e c k s                                    • Le s s me mory p e r nod e
 • C o d in g r e q u ir e d f o r p a r a lle l             r e q u ir e d
   c o d e e x e c u t io n                                • S im p le r t o p r o g r a m
                                                           • H ig h p e r f o r m a n c e , lo w c o s t ,
                                                             e a s y t o d e p lo y
The UV2 Advantage

 o n g 15 y e a r h e r i t a g e : s a m e p r i n c i p l e s a s A l t i x
 4 7 0 0 , …. b u t

      ntel Sandy Bridge Xeon Multi-Core Processors

      arge scalable Shared Memory System
 •   Up to 4096 Cores and 64TB per Partition
 •   Up to 2048 Cores, 4096 Threads and 32TB per Partition
 •   Multi-partition Systems with up to 16384 Sockets, 2PB in multiple
     Partitions
 •   MPI, UPC Acceleration by Hardware Offload
 •   Cross Partition Communication

 n 2 0 12 w i t h o u t c o m p e t i t i o n

      y help of proven SGI ccNuma Architecture
SGI UV2 Interconnect with Global Addressing

   UMAlink® routers connect nodes to Multi-rack UV systems

   UB snoops Socket QPI and accelerates remote access
                                         High Radix
                                                                      Router
   UB Offloads             Programmingkmodels
                                   UMA
                                       lin               MPI, UPC, (CoArray not yet)
                                 N
         Altix UV Blade               Altix UV Blade                 Altix UV Blade           Altix UV Blade


             HUB                          HUB                            HUB                      HUB




   CPU                CPU       CPU                CPU         CPU                CPU   CPU                CPU




     64GB           64GB          64GB           64GB            64GB           64GB      64GB           64GB


                                           512GB globally addressable Memory
UV Foundation:
GAM + Communications Offload

            [S]                                 GSM – cc = GAM

            Intel
                                                GSM
            CPU
                                                • Partition Memory ( OS )
                                                  - Max. 2KC 16TB
             PI                                 GAM
           [V] GRU                              • PGAS Memory ( X-Partition )
     TLB             NI                         • Communications Offload ( GRU + AMU )
           [P] AMU        NUMAlink                - Accelerate PGAS Codes
             MI           to Other Nodes          - Accelerate MPI Codes ( MOE v.v. TOE )




      GAM : Globally Addressable Memory  8PB ( 53b )



                                           15
UV1 vs. UV2

                          Socket                            Socket
        S       S       - NHM-EX               SNB-EX-B & SNB-EP -     S               S
                        - WSM-EX                IVB-EX-B & IVB-EP -
                        - QPI 1.0                         QPI 1.1 -
D   H       H
                    D                                                      H       H

S                   S            Glue                          Glue
                                                                               R
                              -H+H+R                      H + H +R -
                         - 3 separate Chips            into 1 Chip -
                               - 90nm                       40 nm -
                        - (D) Directory DIMM   No Directory DIMM -
            R
                          - (S) Snoop DRAM       No Snoop DRAM -
                                                     Better AMOs -

                        Interconnect - NL5      NL6 - Interconnect
                        - 6.25 GT/s                                -
                        - 8B/10B encoding         Higher Payload -
                        - 4 x 12 lanes                16 x 4 lanes -
                        - Cu only                    Cu & Optical -
                        - 7m max                        20m max -

                                       16
UV MPI Barrier




                 17
Additional Performance Acceleration
              Barrier Latency <1usec (4096 thread)




                                                     ltix UV offers up to 3X improvement
                                                     in MPI reduction processes.

                                                     arrier latency is dramatically better
                                                     (80x) than competing platforms

                    HPCC Benchmarks


                             UV with MOE
      UV, MOE disabled
                                                     PCC benchmarks show substantial
              UV with MOE                            improvement possible with MPI
                                                     Offload Engine (MOE)
      UV, MOE disabled


              UV with MOE
      UV. MOE disabled

  0

                                                          Source: SGI Engineering projections
UV2000 16 Socket 8 Blade IRU


                                                              Notes
                                                              • IRU: 10U high by 19” wide by 27” deep
                                                              • 8 blades – 8 Harps & 16 sockets – per IRU
                                                              • 1 or 2 CMCs in rear of IRU
                                           CMC                • 3 UV1 12V Power Supplies
                                                              • Nine 12V cooling fans N+1
Signal
  BP                                      Power BP
                                          Signal BP              Two signal backplanes
          16 NL channels cabled in air
                    plenum
           to connect the right and                   Powerbackplane
                 left backplane




                                  Front
                                                 19
SGI UV2 Node Architecture and Numalink 6
                                   PCIe Gen3 x16              PCIe Gen3 x16

                                                                              4 DDR3
                                          Sandy               Sandy           channels
                                          Bridge-           Bridge-EP         2DPC
                                         EP or EX             or EX           1600MHz


                                                                     QPI 1.1 8GT/s 32GB/s

                                                                              NL6
                                                UV2-HUB                       16 x4 channels
                                                                                 12.5GT/s
      IRU external links                        16 x4 NL6                     IRU external links
      NL0-plane                                                               NL1-plane


                                           12 IRU internal links                  NO memory Buffers as in UV1
 umalink 6
     2.5GT/s – or – 6.7GB/s net bidirectional bandwidth per link
                                                                                  Same per socket performance
     6 NL6 links aggregate Bandwidth out of blade: 10 7 . 2 G B /s                    as in cluster
     2 NL6 internal links to backplane – aggregate: 8 0 . 4 G B /s                40 PCIe lanes per socket
      4 NL6 external links to routers – 2 6 . 8 G B /s


 umalink 6 Routers
     6 NL6 ports


 umalink cable
UV2 Topology
y s t e m To p o l o g y

 IRU                              Blade

  ypercube

  ax 2 hops between blades




                             21
UV 2 Feature Advances



Feature           UV1                UV2
System scale      2048c/4096t        4096c/4096t
Memory/SSI        16TB               64TB
Interconnect      NUMAlink 5         NL 6 (2.5X data rate)
NL fabric Scale   32K sockets        32K+ sockets
Processor         Nehalem EX         Sandybridge
Sockets/rack      64 (large 24”)     64 (standard 19”)
Reliability       Enterprise Class   Enterprise Class


                          22
MIC Architecture            X8 6
                                   com
                                      p at
                                             ible




                   1.3TF/s Double Precision peak
                   340GB/s bandwidth
S G I IC E X …
Fifth Generation ICE System
         • T h e w o r ld ’ s
           fa s te s t
           s u p e rc o mp u te r
           ju s t g o t f a s t e r !
         • F le x ib le t o f it
           y o u r w o r k lo a d
SGI® ICE: Firsts and Onlies
• F i r s t * o v e r 1P F p e a k * I n f i n i B a n d pure
  compute connected CPU cluster
• W o r l d ' s f a s t e s t distributed memory system
• World’s fastest and m o s t s c a l a b l e computational
  fluid dynamics system
• F i r s t a n d o n l y v e n d o r t o s u p p o r t multiple
  fabric level topologies + flexibility at the node, switch and
  fabric level + application benchmarking expertise for same
• First and only vendor capable of l i v e , l a r g e - s c a l e
  c o m p u t e c a p a c it y in t e g r a t io n


©2011 SGI
D ialing U p T he D ensity!
SGI ICE 8400  S G I I C E X
 S G I IC E 8 4 0 0   D -R a c k =    M -R a c k 72 x 2 =
        =64N              72N               14 4 N
    (128 Sockets)     (144 Sockets)      (288 Sockets)




    30” Width         24” Width        28” Width
S G I I C E X Enclosure Design Building
Block Increments of Two Blade Enclosures - “One
Enclosure Pair”

F e a ture s p e r
E n c lo s u r e
                                      17.7
P a ir :

• 3 6 b la d e               16.59
                             (9.5U)
  s lo t s                                      Rear View

                                               21U
• F o u r f a b r ic          1.75           “Building
  s w it c h s lo t s         (1U)            Block”


• I n t e g r a t e d Separable
                                              19” rack mount
  m a n a g e m e nPower Shelf
                      t
S G I I C E X Compute Blade
IP-113 (Dakota) for “D-Rack”

           F D R M e z z a n in e   M a in F e a t u r e s :
            C a r d O p t io n s
                                    •Supports single or dual plane
                                    FDR InfiniBand
                                    •Supports two future Intel®
                                    Xeon® processor E5 family
                                    CPUs
                                    •Supports up to eight DDR3
                                    DIMMs per socket @ 1600 MT/
                                    s
                                    •Houses up to two 2.5” SATA
                                    drives for local swap/scratch
                                    usage
                                    •Utilizes traditional heat sinks
S G I I C E X Compute Blade
IP-115 (Gemini Twin) for “M-Rack”
                               M a in F e a t u r e s :
                               •Supports single plane FDR
                               InfiniBand
                               •Supports four future Intel®
                               Xeon® processor E5 family
                               CPUs
                               •Two dual socket nodes
                               •Supports four DDR3
                               DIMMs per socket @
                               1600 MT/s
                               •Houses up to two 2.5”
                               SATA drives for local swap/
                               scratch usage
                                    • One per node
                               •Utilizes traditional heat
                               sinks and cold sinks (liquid)




©2011 SGI
On-Socket Water-Cooling Detail
U s e d f o r I P - 115 G e m i n i “ t w i n ” b l a d e s ;
r e p la c e s t h e t r a d it io n a l a ir -c o o le d h e a t s in k s
o n t h e C P U s t o e n a b le h ig h e s t w a t t S K U
s upport
•Resides between the pair of node boards in each blade slot (“M-Rack”
deployment)
•Enables highest watt SKU support (e.g., 130W TDPs)
•Utilizes a liquid-to-water heat exchanger that provisions the required
quantity of flow to the M-Racks for cooling
                     Out
Notable Features of a “Cell”
D-Cell and M-Cell               O n e C o o lin g
                                                      O ne C o mp ute
                                                      Rac k
                                     Rac k

• “ C lo s e d -L o o p
  A i r f l o w ” Environment
• Supports W a r m                                  O n e C o m p le t e C e ll

  W a t e r Cooling
• Contains Large,
  “ U n if ie d ”
  C o o l i n g R a c k s for
  Efficiency

©2011 SGI
Common Topologies


                                                              Mes h or
                  F a t Tre e                  E nha nc e      To ru s
                    (C LOS      H yp e r c u        d         (2 , 3 or
 A ll-t o -A ll
                  N e tw o rk       be         H yp e r c u     more
                       s )                         be         d im e n s i
                                                                 oWill )
                                                                  ns
S u ppo r te d o n S GI ICE 8 4 0 0 a n d S GI ICE X s u ppo r t
                                                               w h e n in
                                                                OF E D



©2011 SGI
ICE Differentiation: OS Noise Synchronization
    •      OS system noise: CPU cycles stolen from a user application by the OS to do
           periodic or asynchronous work (monitoring, daemons, garbage collection, etc).
    •      Management interface will allow users to select what gets synchronized
    •      Performance boost on larger scales systems
Process on:                Unsynchronized OS Noise → Wasted Cycles
                    System                                  Wasted                Wasted
    Node 1          Overhead                                Cycles                Cycles

   Node 2                          Wasted              System                     Wasted
               Compute Cycles      Cycles              Overhead                   Cycles

    Node 3                         Wasted                   Wasted    System
                                   Cycles                   Cycles    Overhead

                Barrier Complete
 Process on:
                    System
   Node 1           Overhead

                    System
   Node 2           Overhead

                    System
   Node 3           Overhead

                         Synchronized OS Noise → Faster Results

Slide 33                                    Time
S G I IC E X


C o o l C us to me rs
S G I I C E X : Initial Customers
 • N A S A : Increasing their current SGI ICE system, called
   “Pleiades,” by 35% with multiple racks with future Intel®
   Xeon® processor E5 family – will have 1.7 petaflops
    • Facilitate new discoveries for Earth Science research projects
    • Modeling and simulation to support flight regimes and new
      designs for aircraft
    • Engineering risk assessment of crew risk probabilities to
      support development of
      launch and commercial crew vehicles for space exploration
      missions
 • N T N U : 13 SGI ICE X racks @ >275 teraflops; 4 SGI
   InfiniteStorage 16000 racks @ 1.2 petabytes
    • Accelerate numerical weather predictions
    • Develop atmospheric and oceanographic models for improved
      weather forecasting

©2011 SGI
UN Chief Calls for Urgent Action
  on Climate Change
  NASA Advanced Supercomputing Division
  SGI® ICE




Images taken by the Thematic Mapper sensor aboard Landsat 5
Source: USGS Landsat Missions Gallery, U.S. Department of the Interior / U.S. Geological Survey
Cyclones
Cyclone Service Models


   SGI delivers techincal application

    expertise.
                                         Software (SaaS)
   SGI delivers commercially

    available open and 3rd party

    software via the Internet.
                                         SGI Cyclone
   SGI offers a platform for

    developers

   SGI delivers the system

    infrastructure.
SGI OpenFOAM® Ready for Cyclone
                                                                                                            Customer : iVEC and Curtin
               Technical Applications Portal
               Powered by
                                                                                                             University Australia
                                                                              User                          Problem: Solving large scale CFD
                                                    Su
                                                       b   mi
                                                                                                              problems like simulating wind flows
                                                                ts
                                                                   Job                                        in the capital city of Perth.
                                                                                                            Solution: OpenFOAM scaled on SGI
                                                                                                             Cyclone better (1024 cores) and
                                                                                                             was 20x faster than on Amazon
                                                                                                             EC2.




Source: Dr Andrew King, Department of Mechanical Engineering Curtin, University of Technology, Australia
Balanced design & architecture

Do you attach
Caravan attached
to the F1?
©2011 SGI

Contenu connexe

Similaire à SGI - HPC-29mai2012

AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdfAI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdfObject Automation
 
Profiling blueprints
Profiling blueprintsProfiling blueprints
Profiling blueprintsbergel
 
Malinina paper presentation_is_maturity_knowing_what_your_limitations_are
Malinina paper presentation_is_maturity_knowing_what_your_limitations_areMalinina paper presentation_is_maturity_knowing_what_your_limitations_are
Malinina paper presentation_is_maturity_knowing_what_your_limitations_areYC_SOVNET
 
Break up the Monolith: Testing Microservices
Break up the Monolith: Testing MicroservicesBreak up the Monolith: Testing Microservices
Break up the Monolith: Testing MicroservicesMarcus Merrell
 
Concurrency And Erlang
Concurrency And ErlangConcurrency And Erlang
Concurrency And Erlangl xf
 
Ee325 cmos design lab 5 report - loren k schwappach
Ee325 cmos design   lab 5 report - loren k schwappachEe325 cmos design   lab 5 report - loren k schwappach
Ee325 cmos design lab 5 report - loren k schwappachLoren Schwappach
 
Geocent scrum cmmi (without animations) 2
Geocent scrum cmmi (without animations) 2Geocent scrum cmmi (without animations) 2
Geocent scrum cmmi (without animations) 2drewz lin
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsUniversität Rostock
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Operating MongoDB in the Cloud
Operating MongoDB in the CloudOperating MongoDB in the Cloud
Operating MongoDB in the CloudMongoDB
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloudNicolas Poggi
 
AI-Inspired IOT Chiplets and 3D Heterogeneous Integration
AI-Inspired IOT Chiplets and 3D Heterogeneous IntegrationAI-Inspired IOT Chiplets and 3D Heterogeneous Integration
AI-Inspired IOT Chiplets and 3D Heterogeneous IntegrationObject Automation
 
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceAI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceinside-BigData.com
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jainSahil Jain
 
Devoxx 2009 University session Jbpm4 In Action
Devoxx 2009 University session Jbpm4 In ActionDevoxx 2009 University session Jbpm4 In Action
Devoxx 2009 University session Jbpm4 In ActionJoram Barrez
 
Grid technology for next gen media processing
Grid technology for next gen media processingGrid technology for next gen media processing
Grid technology for next gen media processingvrt-medialab
 
Ip device integration_notes-updated110103
Ip device integration_notes-updated110103Ip device integration_notes-updated110103
Ip device integration_notes-updated110103TSOLUTIONS
 
Mmx webcast ingles 2 t12
Mmx webcast ingles 2 t12Mmx webcast ingles 2 t12
Mmx webcast ingles 2 t12mmxriweb
 

Similaire à SGI - HPC-29mai2012 (20)

AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdfAI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
 
Profiling blueprints
Profiling blueprintsProfiling blueprints
Profiling blueprints
 
Malinina paper presentation_is_maturity_knowing_what_your_limitations_are
Malinina paper presentation_is_maturity_knowing_what_your_limitations_areMalinina paper presentation_is_maturity_knowing_what_your_limitations_are
Malinina paper presentation_is_maturity_knowing_what_your_limitations_are
 
Break up the Monolith: Testing Microservices
Break up the Monolith: Testing MicroservicesBreak up the Monolith: Testing Microservices
Break up the Monolith: Testing Microservices
 
Concurrency And Erlang
Concurrency And ErlangConcurrency And Erlang
Concurrency And Erlang
 
Ee325 cmos design lab 5 report - loren k schwappach
Ee325 cmos design   lab 5 report - loren k schwappachEe325 cmos design   lab 5 report - loren k schwappach
Ee325 cmos design lab 5 report - loren k schwappach
 
Geocent scrum cmmi (without animations) 2
Geocent scrum cmmi (without animations) 2Geocent scrum cmmi (without animations) 2
Geocent scrum cmmi (without animations) 2
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementations
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Shapira oda perf_webinar_v2
Shapira oda perf_webinar_v2Shapira oda perf_webinar_v2
Shapira oda perf_webinar_v2
 
Huge pages why-what-how
Huge pages why-what-howHuge pages why-what-how
Huge pages why-what-how
 
Operating MongoDB in the Cloud
Operating MongoDB in the CloudOperating MongoDB in the Cloud
Operating MongoDB in the Cloud
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
 
AI-Inspired IOT Chiplets and 3D Heterogeneous Integration
AI-Inspired IOT Chiplets and 3D Heterogeneous IntegrationAI-Inspired IOT Chiplets and 3D Heterogeneous Integration
AI-Inspired IOT Chiplets and 3D Heterogeneous Integration
 
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceAI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jain
 
Devoxx 2009 University session Jbpm4 In Action
Devoxx 2009 University session Jbpm4 In ActionDevoxx 2009 University session Jbpm4 In Action
Devoxx 2009 University session Jbpm4 In Action
 
Grid technology for next gen media processing
Grid technology for next gen media processingGrid technology for next gen media processing
Grid technology for next gen media processing
 
Ip device integration_notes-updated110103
Ip device integration_notes-updated110103Ip device integration_notes-updated110103
Ip device integration_notes-updated110103
 
Mmx webcast ingles 2 t12
Mmx webcast ingles 2 t12Mmx webcast ingles 2 t12
Mmx webcast ingles 2 t12
 

Plus de Agora Group

How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...
How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...
How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...Agora Group
 
Microservicii reutilizabile in arhitecturi bazate pe procese
Microservicii reutilizabile in arhitecturi bazate pe proceseMicroservicii reutilizabile in arhitecturi bazate pe procese
Microservicii reutilizabile in arhitecturi bazate pe proceseAgora Group
 
The role of BPM in Paradigms Shift
The role of BPM in Paradigms ShiftThe role of BPM in Paradigms Shift
The role of BPM in Paradigms ShiftAgora Group
 
Prezentare Ensight_BPM-20171004
Prezentare Ensight_BPM-20171004Prezentare Ensight_BPM-20171004
Prezentare Ensight_BPM-20171004Agora Group
 
Curs Digital Forensics
Curs Digital ForensicsCurs Digital Forensics
Curs Digital ForensicsAgora Group
 
The next generation of Companies management: state of the art in BPM
The next generation of Companies management: state of the art in BPMThe next generation of Companies management: state of the art in BPM
The next generation of Companies management: state of the art in BPMAgora Group
 
Speed Dialing the Enterprise
Speed Dialing the EnterpriseSpeed Dialing the Enterprise
Speed Dialing the EnterpriseAgora Group
 
Arhitectura proceselor în Sistemul Informațional de Sănătate
Arhitectura proceselor în Sistemul Informațional de SănătateArhitectura proceselor în Sistemul Informațional de Sănătate
Arhitectura proceselor în Sistemul Informațional de SănătateAgora Group
 
IBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent BusinessIBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent BusinessAgora Group
 
eHealth 2014_Radu Dop
eHealth 2014_Radu DopeHealth 2014_Radu Dop
eHealth 2014_Radu DopAgora Group
 
Importanța registrelor pentru pacienți
Importanța registrelor pentru paciențiImportanța registrelor pentru pacienți
Importanța registrelor pentru paciențiAgora Group
 
CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...
CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...
CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...Agora Group
 
Perspective naționale și internaționale ale informaticii și standardelor medi...
Perspective naționale și internaționale ale informaticii și standardelor medi...Perspective naționale și internaționale ale informaticii și standardelor medi...
Perspective naționale și internaționale ale informaticii și standardelor medi...Agora Group
 
UTI_Dosarul electronic de sanatate
UTI_Dosarul electronic de sanatateUTI_Dosarul electronic de sanatate
UTI_Dosarul electronic de sanatateAgora Group
 
Class IT - Enemy inside the wire
Class IT - Enemy inside the wireClass IT - Enemy inside the wire
Class IT - Enemy inside the wireAgora Group
 
Infologica - auditarea aplicatiilor mobile
Infologica - auditarea aplicatiilor mobileInfologica - auditarea aplicatiilor mobile
Infologica - auditarea aplicatiilor mobileAgora Group
 
Agora Securitate yugo neumorni
Agora Securitate yugo neumorniAgora Securitate yugo neumorni
Agora Securitate yugo neumorniAgora Group
 
Security threats in the LAN
Security threats in the LANSecurity threats in the LAN
Security threats in the LANAgora Group
 

Plus de Agora Group (20)

How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...
How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...
How to Digitally Transform and Stay Competitive with a Zero-code Digital Busi...
 
Microservicii reutilizabile in arhitecturi bazate pe procese
Microservicii reutilizabile in arhitecturi bazate pe proceseMicroservicii reutilizabile in arhitecturi bazate pe procese
Microservicii reutilizabile in arhitecturi bazate pe procese
 
The role of BPM in Paradigms Shift
The role of BPM in Paradigms ShiftThe role of BPM in Paradigms Shift
The role of BPM in Paradigms Shift
 
Prezentare Ensight_BPM-20171004
Prezentare Ensight_BPM-20171004Prezentare Ensight_BPM-20171004
Prezentare Ensight_BPM-20171004
 
Curs OSINT
Curs OSINTCurs OSINT
Curs OSINT
 
Curs Digital Forensics
Curs Digital ForensicsCurs Digital Forensics
Curs Digital Forensics
 
The next generation of Companies management: state of the art in BPM
The next generation of Companies management: state of the art in BPMThe next generation of Companies management: state of the art in BPM
The next generation of Companies management: state of the art in BPM
 
Speed Dialing the Enterprise
Speed Dialing the EnterpriseSpeed Dialing the Enterprise
Speed Dialing the Enterprise
 
ABPMP Romania
ABPMP RomaniaABPMP Romania
ABPMP Romania
 
Arhitectura proceselor în Sistemul Informațional de Sănătate
Arhitectura proceselor în Sistemul Informațional de SănătateArhitectura proceselor în Sistemul Informațional de Sănătate
Arhitectura proceselor în Sistemul Informațional de Sănătate
 
IBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent BusinessIBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent Business
 
eHealth 2014_Radu Dop
eHealth 2014_Radu DopeHealth 2014_Radu Dop
eHealth 2014_Radu Dop
 
Importanța registrelor pentru pacienți
Importanța registrelor pentru paciențiImportanța registrelor pentru pacienți
Importanța registrelor pentru pacienți
 
CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...
CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...
CYBERCRIME AND THE HEALTHCARE INDUSTRY: Sistemul de sănătate, noua țintă a at...
 
Perspective naționale și internaționale ale informaticii și standardelor medi...
Perspective naționale și internaționale ale informaticii și standardelor medi...Perspective naționale și internaționale ale informaticii și standardelor medi...
Perspective naționale și internaționale ale informaticii și standardelor medi...
 
UTI_Dosarul electronic de sanatate
UTI_Dosarul electronic de sanatateUTI_Dosarul electronic de sanatate
UTI_Dosarul electronic de sanatate
 
Class IT - Enemy inside the wire
Class IT - Enemy inside the wireClass IT - Enemy inside the wire
Class IT - Enemy inside the wire
 
Infologica - auditarea aplicatiilor mobile
Infologica - auditarea aplicatiilor mobileInfologica - auditarea aplicatiilor mobile
Infologica - auditarea aplicatiilor mobile
 
Agora Securitate yugo neumorni
Agora Securitate yugo neumorniAgora Securitate yugo neumorni
Agora Securitate yugo neumorni
 
Security threats in the LAN
Security threats in the LANSecurity threats in the LAN
Security threats in the LAN
 

Dernier

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Dernier (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

SGI - HPC-29mai2012

  • 2. Experts @ H P C Structural Mechanics Structural Mechanics Computational Fluid Electro-Magnetics Implicit Explicit Dynamics Computational Chemistry Computational Chemistry Computational Biology Seismic Processing Quantum Mechanics Molecular Dynamics Reservoir Simulation Rendering / Ray Tracing Climate / Weather Data Analytics Ocean Simulation 2
  • 3. C o m p e t e n c y = Real HPC + Big Storage
  • 5. There are no small things
  • 6. OpenFOAM® Performance with SGI MPI Speedup Performance: SGI MPT < --> OpenMPI Ratio MPT / OpenMPI 3,0 2,0 Automotive Interior Climate 2,72 2,5 Model, 19M cells 2,29 2,01 2,01 2,0 1 ,73 1 ,59 1 ,5 1 ,5 1 ,35 1 ,02 1 ,00 1 ,0 14 ,1 0,5 1 ,08 1 ,02 0,0 1 ,0 64 1 28 1 92 256 # Cores SGI MPT Speedup OpenMPI Speedup MPT/OpenMPI Ratio OpenFOAM with SGI MPI with up to 35% better performance 6
  • 7. W h a t is t h e “ a v e r a g e ” SGI Confidential p o w e r c o n s u m p t io n ? Linpack 30.5kW* STREAM GUPS Fluent 22.1kW* 23.3kW* 22.4kW* 72.5% 76.4% 73.4% Linpack kW Linpack kW Linpack kW Idle 15.9kW* 52.1% Linpack kW Average power consumption heavily depends on • application and its data profile • the level of code optimization (+ libraries + MPI optimization) • the ability of Job Scheduler to utilize the system • the bottlenecks in I/O subsystem and in OS * Measured on ICE 8200 system with 128x 2.66GHz Quad-Core Intel® Xeon® Processor 5300 series (1 Rack)
  • 8. Where is performance? Accelerated
  • 9. R e a l M e m o r y B a n d w id t h R e q u ir e m e n t s M e a s u re me nts a t L R Z o n S G I A lt ix 4 7 0 0 Source: Matthias Brehm (LRZ) in inSide, vol 4 No 2 p s 1 F lo /s : 1B
  • 10. SGI HPC Servers and Supercomputers Scale-Out Scale-Up Rackable™ CloudRack™ Altix ® ICE Altix® UV 1U, 2U, 3U, 4U & XE Tray Cluster Blade Cluster Shared-Memory Build-to-Order Leader Architecture (for Internet DC) Architecture Architecture Scalability Leader Virtualization & May-Core Leader
  • 11. S G I U V2 4th Generation SMP System • T h e m o s t f le x ib le s ys te m !
  • 12. SGI UV Shared Memory Architecture C o m m o d it y C lu s t e r s S G I U V P la t f o r m In f in iB a n d o r G ig a b it E t h e r n e t S G I N U M A lin k In t e r c o n n e c t Mem me m me m me m me m G l o b a l s h a r e d m e m o r y t o 16 T B ~ 64GB S ys te m s ys te m ys te m ys te m ys te m + s + s + s + ... s ys te m + + OS OS OS OS OS OS • E a c h s ys te m h a s o w n • A ll n o d e s o p e r a t e o n o n e me mo ry a nd O S la r g e s h a r e d m e m o r y s p a c e • N o d e s c o m m u n ic a t e o v e r • E lim in a t e s d a t a p a s s in g c o m m o d it y in t e r c o n n e c t b e tw e e n no d e s • In e f f ic ie n t c r o s s -n o d e • B ig d a t a s e t s f it e n t ir e ly in c o m m u n ic a t io n c r e a t e s me mo ry b o t t le n e c k s • Le s s me mory p e r nod e • C o d in g r e q u ir e d f o r p a r a lle l r e q u ir e d c o d e e x e c u t io n • S im p le r t o p r o g r a m • H ig h p e r f o r m a n c e , lo w c o s t , e a s y t o d e p lo y
  • 13. The UV2 Advantage o n g 15 y e a r h e r i t a g e : s a m e p r i n c i p l e s a s A l t i x 4 7 0 0 , …. b u t ntel Sandy Bridge Xeon Multi-Core Processors arge scalable Shared Memory System • Up to 4096 Cores and 64TB per Partition • Up to 2048 Cores, 4096 Threads and 32TB per Partition • Multi-partition Systems with up to 16384 Sockets, 2PB in multiple Partitions • MPI, UPC Acceleration by Hardware Offload • Cross Partition Communication n 2 0 12 w i t h o u t c o m p e t i t i o n y help of proven SGI ccNuma Architecture
  • 14. SGI UV2 Interconnect with Global Addressing UMAlink® routers connect nodes to Multi-rack UV systems UB snoops Socket QPI and accelerates remote access High Radix Router UB Offloads Programmingkmodels UMA lin MPI, UPC, (CoArray not yet) N Altix UV Blade Altix UV Blade Altix UV Blade Altix UV Blade HUB HUB HUB HUB CPU CPU CPU CPU CPU CPU CPU CPU 64GB 64GB 64GB 64GB 64GB 64GB 64GB 64GB 512GB globally addressable Memory
  • 15. UV Foundation: GAM + Communications Offload [S] GSM – cc = GAM Intel GSM CPU • Partition Memory ( OS ) - Max. 2KC 16TB PI GAM [V] GRU • PGAS Memory ( X-Partition ) TLB NI • Communications Offload ( GRU + AMU ) [P] AMU NUMAlink - Accelerate PGAS Codes MI to Other Nodes - Accelerate MPI Codes ( MOE v.v. TOE ) GAM : Globally Addressable Memory  8PB ( 53b ) 15
  • 16. UV1 vs. UV2 Socket Socket S S - NHM-EX SNB-EX-B & SNB-EP - S S - WSM-EX IVB-EX-B & IVB-EP - - QPI 1.0 QPI 1.1 - D H H D H H S S Glue Glue R -H+H+R H + H +R - - 3 separate Chips into 1 Chip - - 90nm 40 nm - - (D) Directory DIMM No Directory DIMM - R - (S) Snoop DRAM No Snoop DRAM - Better AMOs - Interconnect - NL5 NL6 - Interconnect - 6.25 GT/s - - 8B/10B encoding Higher Payload - - 4 x 12 lanes 16 x 4 lanes - - Cu only Cu & Optical - - 7m max 20m max - 16
  • 18. Additional Performance Acceleration Barrier Latency <1usec (4096 thread) ltix UV offers up to 3X improvement in MPI reduction processes. arrier latency is dramatically better (80x) than competing platforms HPCC Benchmarks UV with MOE UV, MOE disabled PCC benchmarks show substantial UV with MOE improvement possible with MPI Offload Engine (MOE) UV, MOE disabled UV with MOE UV. MOE disabled 0 Source: SGI Engineering projections
  • 19. UV2000 16 Socket 8 Blade IRU Notes • IRU: 10U high by 19” wide by 27” deep • 8 blades – 8 Harps & 16 sockets – per IRU • 1 or 2 CMCs in rear of IRU CMC • 3 UV1 12V Power Supplies • Nine 12V cooling fans N+1 Signal BP Power BP Signal BP Two signal backplanes 16 NL channels cabled in air plenum to connect the right and Powerbackplane left backplane Front 19
  • 20. SGI UV2 Node Architecture and Numalink 6 PCIe Gen3 x16 PCIe Gen3 x16 4 DDR3 Sandy Sandy channels Bridge- Bridge-EP 2DPC EP or EX or EX 1600MHz QPI 1.1 8GT/s 32GB/s NL6 UV2-HUB 16 x4 channels 12.5GT/s IRU external links 16 x4 NL6 IRU external links NL0-plane NL1-plane 12 IRU internal links NO memory Buffers as in UV1 umalink 6 2.5GT/s – or – 6.7GB/s net bidirectional bandwidth per link Same per socket performance 6 NL6 links aggregate Bandwidth out of blade: 10 7 . 2 G B /s as in cluster 2 NL6 internal links to backplane – aggregate: 8 0 . 4 G B /s 40 PCIe lanes per socket 4 NL6 external links to routers – 2 6 . 8 G B /s umalink 6 Routers 6 NL6 ports umalink cable
  • 21. UV2 Topology y s t e m To p o l o g y IRU Blade ypercube ax 2 hops between blades 21
  • 22. UV 2 Feature Advances Feature UV1 UV2 System scale 2048c/4096t 4096c/4096t Memory/SSI 16TB 64TB Interconnect NUMAlink 5 NL 6 (2.5X data rate) NL fabric Scale 32K sockets 32K+ sockets Processor Nehalem EX Sandybridge Sockets/rack 64 (large 24”) 64 (standard 19”) Reliability Enterprise Class Enterprise Class 22
  • 23. MIC Architecture X8 6 com p at ible 1.3TF/s Double Precision peak 340GB/s bandwidth
  • 24. S G I IC E X … Fifth Generation ICE System • T h e w o r ld ’ s fa s te s t s u p e rc o mp u te r ju s t g o t f a s t e r ! • F le x ib le t o f it y o u r w o r k lo a d
  • 25. SGI® ICE: Firsts and Onlies • F i r s t * o v e r 1P F p e a k * I n f i n i B a n d pure compute connected CPU cluster • W o r l d ' s f a s t e s t distributed memory system • World’s fastest and m o s t s c a l a b l e computational fluid dynamics system • F i r s t a n d o n l y v e n d o r t o s u p p o r t multiple fabric level topologies + flexibility at the node, switch and fabric level + application benchmarking expertise for same • First and only vendor capable of l i v e , l a r g e - s c a l e c o m p u t e c a p a c it y in t e g r a t io n ©2011 SGI
  • 26. D ialing U p T he D ensity! SGI ICE 8400  S G I I C E X S G I IC E 8 4 0 0 D -R a c k = M -R a c k 72 x 2 = =64N 72N 14 4 N (128 Sockets) (144 Sockets) (288 Sockets) 30” Width 24” Width 28” Width
  • 27. S G I I C E X Enclosure Design Building Block Increments of Two Blade Enclosures - “One Enclosure Pair” F e a ture s p e r E n c lo s u r e 17.7 P a ir : • 3 6 b la d e 16.59 (9.5U) s lo t s Rear View 21U • F o u r f a b r ic 1.75 “Building s w it c h s lo t s (1U) Block” • I n t e g r a t e d Separable 19” rack mount m a n a g e m e nPower Shelf t
  • 28. S G I I C E X Compute Blade IP-113 (Dakota) for “D-Rack” F D R M e z z a n in e M a in F e a t u r e s : C a r d O p t io n s •Supports single or dual plane FDR InfiniBand •Supports two future Intel® Xeon® processor E5 family CPUs •Supports up to eight DDR3 DIMMs per socket @ 1600 MT/ s •Houses up to two 2.5” SATA drives for local swap/scratch usage •Utilizes traditional heat sinks
  • 29. S G I I C E X Compute Blade IP-115 (Gemini Twin) for “M-Rack” M a in F e a t u r e s : •Supports single plane FDR InfiniBand •Supports four future Intel® Xeon® processor E5 family CPUs •Two dual socket nodes •Supports four DDR3 DIMMs per socket @ 1600 MT/s •Houses up to two 2.5” SATA drives for local swap/ scratch usage • One per node •Utilizes traditional heat sinks and cold sinks (liquid) ©2011 SGI
  • 30. On-Socket Water-Cooling Detail U s e d f o r I P - 115 G e m i n i “ t w i n ” b l a d e s ; r e p la c e s t h e t r a d it io n a l a ir -c o o le d h e a t s in k s o n t h e C P U s t o e n a b le h ig h e s t w a t t S K U s upport •Resides between the pair of node boards in each blade slot (“M-Rack” deployment) •Enables highest watt SKU support (e.g., 130W TDPs) •Utilizes a liquid-to-water heat exchanger that provisions the required quantity of flow to the M-Racks for cooling Out
  • 31. Notable Features of a “Cell” D-Cell and M-Cell O n e C o o lin g O ne C o mp ute Rac k Rac k • “ C lo s e d -L o o p A i r f l o w ” Environment • Supports W a r m O n e C o m p le t e C e ll W a t e r Cooling • Contains Large, “ U n if ie d ” C o o l i n g R a c k s for Efficiency ©2011 SGI
  • 32. Common Topologies Mes h or F a t Tre e E nha nc e To ru s (C LOS H yp e r c u d (2 , 3 or A ll-t o -A ll N e tw o rk be H yp e r c u more s ) be d im e n s i oWill ) ns S u ppo r te d o n S GI ICE 8 4 0 0 a n d S GI ICE X s u ppo r t w h e n in OF E D ©2011 SGI
  • 33. ICE Differentiation: OS Noise Synchronization • OS system noise: CPU cycles stolen from a user application by the OS to do periodic or asynchronous work (monitoring, daemons, garbage collection, etc). • Management interface will allow users to select what gets synchronized • Performance boost on larger scales systems Process on: Unsynchronized OS Noise → Wasted Cycles System Wasted Wasted Node 1 Overhead Cycles Cycles Node 2 Wasted System Wasted Compute Cycles Cycles Overhead Cycles Node 3 Wasted Wasted System Cycles Cycles Overhead Barrier Complete Process on: System Node 1 Overhead System Node 2 Overhead System Node 3 Overhead Synchronized OS Noise → Faster Results Slide 33 Time
  • 34. S G I IC E X C o o l C us to me rs
  • 35. S G I I C E X : Initial Customers • N A S A : Increasing their current SGI ICE system, called “Pleiades,” by 35% with multiple racks with future Intel® Xeon® processor E5 family – will have 1.7 petaflops • Facilitate new discoveries for Earth Science research projects • Modeling and simulation to support flight regimes and new designs for aircraft • Engineering risk assessment of crew risk probabilities to support development of launch and commercial crew vehicles for space exploration missions • N T N U : 13 SGI ICE X racks @ >275 teraflops; 4 SGI InfiniteStorage 16000 racks @ 1.2 petabytes • Accelerate numerical weather predictions • Develop atmospheric and oceanographic models for improved weather forecasting ©2011 SGI
  • 36. UN Chief Calls for Urgent Action on Climate Change NASA Advanced Supercomputing Division SGI® ICE Images taken by the Thematic Mapper sensor aboard Landsat 5 Source: USGS Landsat Missions Gallery, U.S. Department of the Interior / U.S. Geological Survey
  • 38. Cyclone Service Models  SGI delivers techincal application expertise. Software (SaaS)  SGI delivers commercially available open and 3rd party software via the Internet. SGI Cyclone  SGI offers a platform for developers  SGI delivers the system infrastructure.
  • 39. SGI OpenFOAM® Ready for Cyclone  Customer : iVEC and Curtin Technical Applications Portal Powered by University Australia User  Problem: Solving large scale CFD Su b mi problems like simulating wind flows ts Job in the capital city of Perth.  Solution: OpenFOAM scaled on SGI Cyclone better (1024 cores) and was 20x faster than on Amazon EC2. Source: Dr Andrew King, Department of Mechanical Engineering Curtin, University of Technology, Australia
  • 40. Balanced design & architecture Do you attach Caravan attached to the F1?

Notes de l'éditeur

  1. Multiple runs and optimizations have yielded different results Just focus on the graph showing the “relative” comparison of Linpack, idle, and application/benchmark power typical
  2. The world’s fastest supercomputer just got faster! Largest performance boost ever - up to 5x performance density improvement over previous industry-leading generation - with future Intel ® Xeon ® processor E5 family Key design innovations and increased flexibility through enhanced R&amp;D investment The world-renowned SGI quality and performance you love Entirely built on industry-standard hardware and software components, enabling access to the full spectrum of the Linux ecosystem Only system in its class that installs production-ready in hours or days, not weeks or months Flexible to fit your workload Ultimate configuration flexibility in topology/interconnect, power, cooling, CPUs and memory Seamless scalability from tens of teraflops to tens of petaflops Expandability within and across technology generations while maintaining uninterrupted production workflow
  3. The world’s fastest supercomputer just got faster! Largest performance boost ever - up to 5x performance density improvement over previous industry-leading generation - with future Intel ® Xeon ® processor E5 family Key design innovations and increased flexibility through enhanced R&amp;D investment The world-renowned SGI quality and performance you love Entirely built on industry-standard hardware and software components, enabling access to the full spectrum of the Linux ecosystem Only system in its class that installs production-ready in hours or days, not weeks or months Flexible to fit your workload Ultimate configuration flexibility in topology/interconnect, power, cooling, CPUs and memory Seamless scalability from tens of teraflops to tens of petaflops Expandability within and across technology generations while maintaining uninterrupted production workflow
  4. First *over 1PF peak* InfiniBand pure compute connected CPU cluster World&apos;s fastest distributed memory system Top Intel-based overall SPEC_MPIM2007 and SPEC_MPIL2007 performance (base and peak) Top AMD-based SPEC_MPIM2007 and SPEC_MPIL2007 performance (base and peak) World’s fastest and most scalable computational fluid dynamics system SGI ICE 8400 demonstrated unmatched parallel scaling up to 3,072 cores with a rating of 1,333.3 standard benchmark jobs per day Also proved the ability to run ANSYS FLUENT on all 4,092 cores; to date, no other cluster has reported ANSYS FLUENT benchmark results above 2,048 cores The ANSYS FLUENT benchmark performance increase was achieved with the help of SGI MPI PerfBoost First and only vendor to support multiple fabric level topologies + flexibility at the node, switch and fabric level + application benchmarking expertise for same First and only vendor capable of live, large-scale compute capacity integration
  5. Used for IP-115 Gemini “twin” blades; replaces the traditional air-cooled heat sinks on the CPUs to enable highest watt SKU support Resides between the pair of node boards in each blade slot (“M-Rack” deployment) Enables highest watt SKU support (e.g., 130W TDPs) Utilizes a liquid-to-water heat exchanger that provisions the required quantity of flow to the M-Racks for cooling  
  6. “ Closed-Loop Airflow” Environment Integrated hot aisle containment No air from within the cell is mixed with the data center air wherein the cell is installed (versus a hot/cold aisle arrangement - open loop airflow - wherein the air is mixed) Always water-cooled Supports Warm Water Cooling Broad range of acceptable temperatures for additional cost savings Contains air-to-water heat exchanger Contains a liquid-to-water heat exchanger when cold sinks are deployed Contains Large, “Unified” Cooling Racks for Efficiency Compute racks do not have their own cooling at the rack level Decreases power costs associated with cooling All cooling elements utilize one water source
  7. Synchronization of the OS overhead related tasks on each node to begin simultaneously on all nodes in the cluster results in significantly less wasted cycles over the duration of parallel workloads. The negative effect of unsynchronized OS noise grows continuously worse as node counts rise.
  8. Left: August 1985. Right: August 2010. Iran’s Lake Oroumeih (also spelled Urmia) is the largest lake in the Middle East and the third largest saltwater lake on Earth. But dams on feeder streams, expanded use of ground water, and a decades-long drought have reduced it to 60 percent of the size it was in the 1980s. Light blue tones in the 2010 image represent shallow water and salt deposits. Increased salinity has led to an absence of fish and habitat for migratory waterfowl. At the current rate, the lake will be completely dry by the end of 2013.
  9. Customer Name: iVEC and Dr Andrew King, Department of Mechanical Engineering Curtin, University of Technology, Australia Challenge : iVEC and the Fluid Dynamics Research Group at Curtin University are working together to solve large scale CFD problems like simulating wind flows in the capital city of Perth. SGI Cyclone Solution: The testing included running OpenFOAM on internal systems, SGI Cyclone and Amazon EC2 cloud. SGI Cyclone proved to scale better (1,024 cores) and was much faster!