SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
InfiniBand? Problems? Do you care?




Christian Kniep / Jan Wender
science + computing ag
IT services for sophisticated computer environments
Tübingen | München | Berlin | Düsseldorf
Agenda

   This is an interactive session!
   ▪      Who is on the podium?
   ▪      Living Histogram?
   ▪      Getting some statistics
          ▪      Living Histogram
   ▪      Existing Monitoring Solutions
   ▪      Discussion
          ▪      Quick and Dirty Analysis
          ▪      Conclusions




Page 2

BoF InfiniBand | 2012-06-19                 © 2012 science +   computing ag
On the podium




Page 3

BoF InfiniBand | 2012-06-19   © 2012 science +   computing ag
science + computing at a glance

    Founding Year             1989

    Locations                 Tübingen
                              München 
          
                              Berlin
                              Düsseldorf

    Employees                 270
    Shareholder               Bull S.A. (100%)
    Revenue 10/11             27 Mio. Euro

    Partners                  Daikin Industries, Japan
                              NICE srl, Italien
                              Exa Corporation, USA
                              Platform Computing, Kanada



Page 4

BoF InfiniBand | 2012-06-19                                © 2012 science +   computing ag
Living Histogram?




                              Brian L. Joiner, International Statistical Review / Revue Internationale de Statistique, Vol. 43, No. 3. (Dec.,1975), pp. 339-340.


Page 5

BoF InfiniBand | 2012-06-19                                                                                                                                   © 2012 science +   computing ag
Living Histogram

      Size of Fabric
  ▪         <10
  ▪         <50
  ▪         <500
  ▪         >500




Page    6

BoF InfiniBand | 2012-06-19   © 2012 science +   computing ag
Living Histogram

      Switch Structure
  ▪         Switch size
            ▪   singular switch
                (mlx4036, qlogic12300)
            ▪   Modular switch
                (mlx5600, qlogic12800)
  ▪         Amount
            ▪   few
            ▪   many




Page    7

BoF InfiniBand | 2012-06-19              © 2012 science +   computing ag
Living Histogram

      Focus
  ▪         Stability
            ➡     maintenance cost
  ▪         High-Perfomance
            ➡     extremly optimized




Page    8

BoF InfiniBand | 2012-06-19            © 2012 science +   computing ag
Living Histogram

      Type of Use
  ▪         Cluster Purpose
            ▪   Single Purpose Cluster
            ▪   Multi Purpose Cluster
  ▪         Usage
            ▪   One Job at a time
            ▪   Multiple Jobs




Page    9

BoF InfiniBand | 2012-06-19              © 2012 science +   computing ag
Living Histogram

      Kind/Amount of Problems
  ▪      Impact
         ▪      minor
         ▪      major
  ▪      Amount
         ▪      few
         ▪      many




Page 10

BoF InfiniBand | 2012-06-19     © 2012 science +   computing ag
Living Histogram

      Problem solving
  ▪      Iterative
           ➡      reseat / reboot
  ▪      Analytic
           ➡      dig into the problem
           ➡      try to wipe it out




Page 11

BoF InfiniBand | 2012-06-19              © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

▪       infiniband-diags                           ▪   wrapper of infiniband-diags
        ▪      ibcheckerrors                       ▪   INAM (Ohio-State-University)
        ▪      ibdiagpath
                                                   ▪   QNIB
▪       plugin to non-IB systems
                                                   ▪   .....
        ▪      nagios
        ▪      collectl
▪       hardware vendor suites                     not listed stuff
        ▪      Unified Fabric Manager (Mellanox) ▪     ...
        ▪      InfiniBand Fabric Suites (QLogic)




    Page 12

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

▪       infiniband-diags                           ▪   wrapper of infiniband-diags
        ▪      ibcheckerrors                       ▪   INAM (Ohio-State-University)
        ▪      ibdiagpath
                                                   ▪   QNIB
▪       plugin to non-IB systems
                                                   ▪   .....
        ▪      nagios
        ▪      collectl
▪       hardware vendor suites                     not listed stuff
        ▪      Unified Fabric Manager (Mellanox) ▪     ...
        ▪      InfiniBand Fabric Suites (QLogic)




    Page 13

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Modular Switches

switchguid=0xac1(ac1)!        #   Spine 1
Switch!     36 "S-ac1"!       #   "A1" enhanced port 0 lid 11 lmc 0
[1]!        "S-bc1"[1]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[1]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[1]!       #   "B3" lid 23 4xQDR

switchguid=0xac2(ac2)!        #   Spine 2
Switch!     36 "S-ac2"!       #   "A2" enhanced port 0 lid 12 lmc 0
[1]!        "S-bc1"[2]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[2]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[2]!       #   "B3" lid 23 4xQDR

switchguid=0xbc1(bc1)!        #   Line 1
Switch      36 "S-bc1"!       #   "B1" enhanced port 0 lid 21 lmc 0
[1]!        "S-ac1"[1]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[1]        #   "A2" lid 12 4xQDR
[3]         "H-1"[1](f1)      #   "Host1" lid 101 4xQDR

switchguid=0xbc2(bc2)!        #   Line 2
Switch!     36 "S-bc2"!       #   "B2" enhanced port 0 lid 22 lmc 0
[1]!        "S-ac1"[2]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[2]        #   "A2" lid 12 4xQDR
[3]         "H-2"[1](f2)      #   "Host2" lid 102 4xQDR

switchguid=0xbc3(bc3)!        #   Line 3
Switch!     36 "S-bc3"!       #   "B3" enhanced port 0 lid 23 lmc 0
[1]!        "S-ac1"[3]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[3]        #   "A2" lid 12 4xQDR
[3]         "H-3"[1](f3)      #   "Host3" lid 103 4xQDR


Page 14

BoF InfiniBand | 2012-06-19                                           © 2012 science +   computing ag
Modular Switches

switchguid=0xac1(ac1)!        #   Spine 1
Switch!     36 "S-ac1"!       #   "A1" enhanced port 0 lid 11 lmc 0               Chassis1
[1]!        "S-bc1"[1]!       #   "B1" lid 21 4xQDR                      Spine1                  Spine2
[2]!        "S-bc2"[1]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[1]!       #   "B3" lid 23 4xQDR

switchguid=0xac2(ac2)!        #   Spine 2
Switch!     36 "S-ac2"!       #   "A2" enhanced port 0 lid 12 lmc 0
[1]!        "S-bc1"[2]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[2]!       #   "B2" lid 22 4xQDR                   Line1        Line2                Line3
[3]!        "S-bc3"[2]!       #   "B3" lid 23 4xQDR

switchguid=0xbc1(bc1)!        #   Line 1
Switch      36 "S-bc1"!       #   "B1" enhanced port 0 lid 21 lmc 0
[1]!        "S-ac1"[1]!       #   "A1" lid 11 4xQDR                   Host1        Host2                 Host3
[2]         "S-ac2"[1]        #   "A2" lid 12 4xQDR
[3]         "H-1"[1](f1)      #   "Host1" lid 101 4xQDR

switchguid=0xbc2(bc2)!        #   Line 2
Switch!     36 "S-bc2"!       #   "B2" enhanced port 0 lid 22 lmc 0
[1]!        "S-ac1"[2]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[2]        #   "A2" lid 12 4xQDR
[3]         "H-2"[1](f2)      #   "Host2" lid 102 4xQDR

switchguid=0xbc3(bc3)!        #   Line 3
Switch!     36 "S-bc3"!       #   "B3" enhanced port 0 lid 23 lmc 0
[1]!        "S-ac1"[3]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[3]        #   "A2" lid 12 4xQDR
[3]         "H-3"[1](f3)      #   "Host3" lid 103 4xQDR


Page 15

BoF InfiniBand | 2012-06-19                                                                © 2012 science +   computing ag
Modular Switches

switchguid=0xac1(ac1)!        #   Spine 1
Switch!     36 "S-ac1"!       #   "A1" enhanced port 0 lid 11 lmc 0               Chassis1
[1]!        "S-bc1"[1]!       #   "B1" lid 21 4xQDR                      Spine1                  Spine2
[2]!        "S-bc2"[1]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[1]!       #   "B3" lid 23 4xQDR

switchguid=0xac2(ac2)!        #   Spine 2
Switch!     36 "S-ac2"!       #   "A2" enhanced port 0 lid 12 lmc 0
[1]!        "S-bc1"[2]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[2]!       #   "B2" lid 22 4xQDR                   Line1        Line2                Line3
[3]!        "S-bc3"[2]!       #   "B3" lid 23 4xQDR

switchguid=0xbc1(bc1)!        #   Line 1
Switch      36 "S-bc1"!       #   "B1" enhanced port 0 lid 21 lmc 0
[1]!        "S-ac1"[1]!       #   "A1" lid 11 4xQDR                   Host1        Host2                 Host3
[2]         "S-ac2"[1]        #   "A2" lid 12 4xQDR
[3]         "H-1"[1](f1)      #   "Host1" lid 101 4xQDR

switchguid=0xbc2(bc2)!        #   Line 2
Switch!     36 "S-bc2"!       #   "B2" enhanced port 0 lid 22 lmc 0
[1]!        "S-ac1"[2]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[2]        #   "A2" lid 12 4xQDR                               Chassis1
[3]         "H-2"[1](f2)      #   "Host2" lid 102 4xQDR

switchguid=0xbc3(bc3)!        #   Line 3
Switch!     36 "S-bc3"!       #   "B3" enhanced port 0 lid 23 lmc 0
[1]!        "S-ac1"[3]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[3]        #   "A2" lid 12 4xQDR
[3]         "H-3"[1](f3)      #   "Host3" lid 103 4xQDR               Host1        Host2                 Host3

Page 16

BoF InfiniBand | 2012-06-19                                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

▪       infiniband-diags                           ▪   wrapper of infiniband-diags
        ▪      ibcheckerrors                       ▪   INAM (Ohio-State-University)
        ▪      ibdiagpath
                                                   ▪   QNIB
▪       plugin to non-IB systems
                                                   ▪   .....
        ▪      nagios
        ▪      collectl
▪       hardware vendor suites                     not listed stuff
        ▪      Unified Fabric Manager (Mellanox) ▪     ...
        ▪      InfiniBand Fabric Suites (QLogic)




    Page 17

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

▪       infiniband-diags                           ▪   wrapper of infiniband-diags
        ▪      ibcheckerrors                       ▪   INAM (Ohio-State-University)
        ▪      ibdiagpath
                                                   ▪   QNIB
▪       plugin to non-IB systems
                                                   ▪   .....
        ▪      nagios
        ▪      collectl
▪       hardware vendor suites                     not listed stuff
        ▪      Unified Fabric Manager (Mellanox) ▪     ...
        ▪      InfiniBand Fabric Suites (QLogic)




    Page 18

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

▪       infiniband-diags                           ▪   wrapper of infiniband-diags
        ▪      ibcheckerrors                       ▪   INAM (Ohio-State-University)
        ▪      ibdiagpath
                                                   ▪   QNIB
▪       plugin to non-IB systems
                                                   ▪   .....
        ▪      nagios
        ▪      collectl
▪       hardware vendor suites                     not listed stuff
        ▪      Unified Fabric Manager (Mellanox) ▪     ...
        ▪      InfiniBand Fabric Suites (QLogic)




    Page 19

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

▪       infiniband-diags                           ▪   wrapper of infiniband-diags
        ▪      ibcheckerrors                       ▪   INAM (Ohio-State-University)
        ▪      ibdiagpath
                                                   ▪   QNIB
▪       plugin to non-IB systems
                                                   ▪   .....
        ▪      nagios
        ▪      collectl
▪       hardware vendor suites                     not listed stuff
        ▪      Unified Fabric Manager (Mellanox) ▪     ...
        ▪      InfiniBand Fabric Suites (QLogic)




    Page 20

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

▪       infiniband-diags                           ▪   wrapper of infiniband-diags
        ▪      ibcheckerrors                       ▪   INAM (Ohio-State-University)
        ▪      ibdiagpath
                                                   ▪   QNIB
▪       plugin to non-IB systems
                                                   ▪   .....
        ▪      nagios
        ▪      collectl
▪       hardware vendor suites                     not listed stuff
        ▪      Unified Fabric Manager (Mellanox) ▪     ...
        ▪      InfiniBand Fabric Suites (QLogic)




    Page 21

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Discussion - Quick Analysis

Fabricsize                                 Type of use
▪      small -> easy as pie?               ▪   willing/forced to share
▪      big            -> crit. mass for    Problemkind / -amount
                          real analysis?   ▪   runs smoothly enough
Switch structure                           Problemsolving
▪      what is your                        ▪   learncurve starts step
       routing algorithm?
Focus
▪      80:20 rule?
                  performance
                  maintenance
Page 22

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
▪      small -> easy as pie?               ▪   willing/forced to share
▪      big            -> crit. mass for    Problem type / amount
                          real analysis?   ▪   runs smoothly enough
Switch structure                           Problem solving
▪      what is your                        ▪   learning curve starts steep
       routing algorithm?
Focus
▪      80:20 rule?
                  performance
                  maintenance
Page 23

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
▪      small -> easy as pie?               ▪   willing/forced to share
▪      big            -> crit. mass for    Problem type / amount
                          real analysis?   ▪   runs smoothly enough
Switch structure                           Problem solving
▪      what is your                        ▪   learning curve starts steep
       routing algorithm?
Focus
▪      80:20 rule?
                  performance
                  maintenance
Page 24

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
▪      small -> easy as pie?               ▪   willing/forced to share
▪      big            -> crit. mass for    Problem type / amount
                          real analysis?   ▪   runs smoothly enough
Switch structure                           Problem solving
▪      what is your                        ▪   learning curve starts steep
       routing algorithm?
Focus                                                                      100
▪      80:20 rule?                                                         75
                  performance                                            50
                  maintenance                                           25
Page 25
                                                                       0
BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
▪      small -> easy as pie?               ▪   willing/forced to share
▪      big            -> crit. mass for    Problem type / amount
                          real analysis?   ▪   runs smoothly enough
Switch structure                           Problem solving
▪      what is your                        ▪   learning curve starts steep
       routing algorithm?
Focus
▪      80:20 rule?
                  performance
                  maintenance
Page 26

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
▪      small -> easy as pie?               ▪   willing/forced to share
▪      big            -> crit. mass for    Problem type / amount
                          real analysis?   ▪   runs smoothly enough
Switch structure                           Problem solving
▪      what is your                        ▪   learning curve starts steep
       routing algorithm?
Focus
▪      80:20 rule?
                  performance
                  maintenance
Page 27

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
▪      small -> easy as pie?               ▪   willing/forced to share
▪      big            -> crit. mass for    Problem type / amount
                          real analysis?   ▪   runs smoothly enough
Switch structure                           Problem solving
▪      what is your                        ▪   learning curve starts steep
       routing algorithm?
Focus
▪      80:20 rule?
                  performance
                  maintenance
Page 28

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
▪      what approach?


Do we scare you?
▪      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 29

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
▪      what approach?


Do we scare you?
▪      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 30

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
▪      what approach?


Do we scare you?
▪      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 31

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
▪      what approach?


Do we scare you?
▪      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 32

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
▪      what approach?


Do we scare you?
▪      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 33

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Thank you for your attention and participation!



science + computing ag
www.science-computing.de

Telefon: +49 (0)7071 9457 - 0
E-Mail: info@science-computing.de

Contenu connexe

Similaire à InfiniBand Problems Monitoring

Ax som-xc7z020-user_manual_en
Ax som-xc7z020-user_manual_enAx som-xc7z020-user_manual_en
Ax som-xc7z020-user_manual_enAlexey Yurko
 
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
[CB20] DeClang: Anti-hacking compiler by Mengyuan WanCODE BLUE
 
Top Ten Programming Mistakes by People New to Siemens
Top Ten Programming Mistakes by People New to SiemensTop Ten Programming Mistakes by People New to Siemens
Top Ten Programming Mistakes by People New to SiemensDMC, Inc.
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)Hansol Kang
 
technical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptxtechnical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptxMostafaKhaled78
 
Nettab 2006 Tutorial 3B part 2
Nettab 2006 Tutorial 3B part 2Nettab 2006 Tutorial 3B part 2
Nettab 2006 Tutorial 3B part 2Matteo Vit
 
Lmb162 abc manual-rev0.2
Lmb162 abc manual-rev0.2Lmb162 abc manual-rev0.2
Lmb162 abc manual-rev0.2aibad ahmed
 
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF controlLS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF controlSunny Chou
 
dokumen.tips_vhdl-0-introduction-to-vhdl.ppt
dokumen.tips_vhdl-0-introduction-to-vhdl.pptdokumen.tips_vhdl-0-introduction-to-vhdl.ppt
dokumen.tips_vhdl-0-introduction-to-vhdl.pptAhmedHeskol2
 
LS-RDIO0808 PLC Wireless link module Modbus RTU
LS-RDIO0808 PLC Wireless link module Modbus RTULS-RDIO0808 PLC Wireless link module Modbus RTU
LS-RDIO0808 PLC Wireless link module Modbus RTUSunny Chou
 
WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4Shigeru Kobayashi
 
Introduction to PCB Design (Eagle)
Introduction to PCB Design (Eagle)Introduction to PCB Design (Eagle)
Introduction to PCB Design (Eagle)yeokm1
 
What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchAndreas Olofsson
 

Similaire à InfiniBand Problems Monitoring (20)

Ax som-xc7z020-user_manual_en
Ax som-xc7z020-user_manual_enAx som-xc7z020-user_manual_en
Ax som-xc7z020-user_manual_en
 
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
 
Top Ten Programming Mistakes by People New to Siemens
Top Ten Programming Mistakes by People New to SiemensTop Ten Programming Mistakes by People New to Siemens
Top Ten Programming Mistakes by People New to Siemens
 
YCAM Workshop Part 1
YCAM Workshop Part 1YCAM Workshop Part 1
YCAM Workshop Part 1
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
technical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptxtechnical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptx
 
Nettab 2006 Tutorial 3B part 2
Nettab 2006 Tutorial 3B part 2Nettab 2006 Tutorial 3B part 2
Nettab 2006 Tutorial 3B part 2
 
Lmb162 abc manual-rev0.2
Lmb162 abc manual-rev0.2Lmb162 abc manual-rev0.2
Lmb162 abc manual-rev0.2
 
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF controlLS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
 
dokumen.tips_vhdl-0-introduction-to-vhdl.ppt
dokumen.tips_vhdl-0-introduction-to-vhdl.pptdokumen.tips_vhdl-0-introduction-to-vhdl.ppt
dokumen.tips_vhdl-0-introduction-to-vhdl.ppt
 
Software maintenance PyConUK 2016
Software maintenance PyConUK 2016Software maintenance PyConUK 2016
Software maintenance PyConUK 2016
 
Geidai Open Workshop 2009
Geidai Open Workshop 2009Geidai Open Workshop 2009
Geidai Open Workshop 2009
 
LS-RDIO0808 PLC Wireless link module Modbus RTU
LS-RDIO0808 PLC Wireless link module Modbus RTULS-RDIO0808 PLC Wireless link module Modbus RTU
LS-RDIO0808 PLC Wireless link module Modbus RTU
 
seminar on PIC1684
seminar on PIC1684seminar on PIC1684
seminar on PIC1684
 
WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4
 
67WS Event FIO Primer
67WS Event FIO Primer67WS Event FIO Primer
67WS Event FIO Primer
 
Introduction to PCB Design (Eagle)
Introduction to PCB Design (Eagle)Introduction to PCB Design (Eagle)
Introduction to PCB Design (Eagle)
 
What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratch
 
Arduino Projects.pptx
Arduino Projects.pptxArduino Projects.pptx
Arduino Projects.pptx
 
Make: Tokyo Meeting 03
Make: Tokyo Meeting 03Make: Tokyo Meeting 03
Make: Tokyo Meeting 03
 

Dernier

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

InfiniBand Problems Monitoring

  • 1. InfiniBand? Problems? Do you care? Christian Kniep / Jan Wender science + computing ag IT services for sophisticated computer environments Tübingen | München | Berlin | Düsseldorf
  • 2. Agenda This is an interactive session! ▪ Who is on the podium? ▪ Living Histogram? ▪ Getting some statistics ▪ Living Histogram ▪ Existing Monitoring Solutions ▪ Discussion ▪ Quick and Dirty Analysis ▪ Conclusions Page 2 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 3. On the podium Page 3 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 4. science + computing at a glance Founding Year 1989 Locations Tübingen München Berlin Düsseldorf Employees 270 Shareholder Bull S.A. (100%) Revenue 10/11 27 Mio. Euro Partners Daikin Industries, Japan NICE srl, Italien Exa Corporation, USA Platform Computing, Kanada Page 4 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 5. Living Histogram? Brian L. Joiner, International Statistical Review / Revue Internationale de Statistique, Vol. 43, No. 3. (Dec.,1975), pp. 339-340. Page 5 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 6. Living Histogram Size of Fabric ▪ <10 ▪ <50 ▪ <500 ▪ >500 Page 6 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 7. Living Histogram Switch Structure ▪ Switch size ▪ singular switch (mlx4036, qlogic12300) ▪ Modular switch (mlx5600, qlogic12800) ▪ Amount ▪ few ▪ many Page 7 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 8. Living Histogram Focus ▪ Stability ➡ maintenance cost ▪ High-Perfomance ➡ extremly optimized Page 8 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 9. Living Histogram Type of Use ▪ Cluster Purpose ▪ Single Purpose Cluster ▪ Multi Purpose Cluster ▪ Usage ▪ One Job at a time ▪ Multiple Jobs Page 9 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 10. Living Histogram Kind/Amount of Problems ▪ Impact ▪ minor ▪ major ▪ Amount ▪ few ▪ many Page 10 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 11. Living Histogram Problem solving ▪ Iterative ➡ reseat / reboot ▪ Analytic ➡ dig into the problem ➡ try to wipe it out Page 11 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 12. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) ▪ infiniband-diags ▪ wrapper of infiniband-diags ▪ ibcheckerrors ▪ INAM (Ohio-State-University) ▪ ibdiagpath ▪ QNIB ▪ plugin to non-IB systems ▪ ..... ▪ nagios ▪ collectl ▪ hardware vendor suites not listed stuff ▪ Unified Fabric Manager (Mellanox) ▪ ... ▪ InfiniBand Fabric Suites (QLogic) Page 12 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 13. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) ▪ infiniband-diags ▪ wrapper of infiniband-diags ▪ ibcheckerrors ▪ INAM (Ohio-State-University) ▪ ibdiagpath ▪ QNIB ▪ plugin to non-IB systems ▪ ..... ▪ nagios ▪ collectl ▪ hardware vendor suites not listed stuff ▪ Unified Fabric Manager (Mellanox) ▪ ... ▪ InfiniBand Fabric Suites (QLogic) Page 13 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 14. Modular Switches switchguid=0xac1(ac1)! # Spine 1 Switch! 36 "S-ac1"! # "A1" enhanced port 0 lid 11 lmc 0 [1]! "S-bc1"[1]! # "B1" lid 21 4xQDR [2]! "S-bc2"[1]! # "B2" lid 22 4xQDR [3]! "S-bc3"[1]! # "B3" lid 23 4xQDR switchguid=0xac2(ac2)! # Spine 2 Switch! 36 "S-ac2"! # "A2" enhanced port 0 lid 12 lmc 0 [1]! "S-bc1"[2]! # "B1" lid 21 4xQDR [2]! "S-bc2"[2]! # "B2" lid 22 4xQDR [3]! "S-bc3"[2]! # "B3" lid 23 4xQDR switchguid=0xbc1(bc1)! # Line 1 Switch 36 "S-bc1"! # "B1" enhanced port 0 lid 21 lmc 0 [1]! "S-ac1"[1]! # "A1" lid 11 4xQDR [2] "S-ac2"[1] # "A2" lid 12 4xQDR [3] "H-1"[1](f1) # "Host1" lid 101 4xQDR switchguid=0xbc2(bc2)! # Line 2 Switch! 36 "S-bc2"! # "B2" enhanced port 0 lid 22 lmc 0 [1]! "S-ac1"[2]! # "A1" lid 11 4xQDR [2] "S-ac2"[2] # "A2" lid 12 4xQDR [3] "H-2"[1](f2) # "Host2" lid 102 4xQDR switchguid=0xbc3(bc3)! # Line 3 Switch! 36 "S-bc3"! # "B3" enhanced port 0 lid 23 lmc 0 [1]! "S-ac1"[3]! # "A1" lid 11 4xQDR [2] "S-ac2"[3] # "A2" lid 12 4xQDR [3] "H-3"[1](f3) # "Host3" lid 103 4xQDR Page 14 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 15. Modular Switches switchguid=0xac1(ac1)! # Spine 1 Switch! 36 "S-ac1"! # "A1" enhanced port 0 lid 11 lmc 0 Chassis1 [1]! "S-bc1"[1]! # "B1" lid 21 4xQDR Spine1 Spine2 [2]! "S-bc2"[1]! # "B2" lid 22 4xQDR [3]! "S-bc3"[1]! # "B3" lid 23 4xQDR switchguid=0xac2(ac2)! # Spine 2 Switch! 36 "S-ac2"! # "A2" enhanced port 0 lid 12 lmc 0 [1]! "S-bc1"[2]! # "B1" lid 21 4xQDR [2]! "S-bc2"[2]! # "B2" lid 22 4xQDR Line1 Line2 Line3 [3]! "S-bc3"[2]! # "B3" lid 23 4xQDR switchguid=0xbc1(bc1)! # Line 1 Switch 36 "S-bc1"! # "B1" enhanced port 0 lid 21 lmc 0 [1]! "S-ac1"[1]! # "A1" lid 11 4xQDR Host1 Host2 Host3 [2] "S-ac2"[1] # "A2" lid 12 4xQDR [3] "H-1"[1](f1) # "Host1" lid 101 4xQDR switchguid=0xbc2(bc2)! # Line 2 Switch! 36 "S-bc2"! # "B2" enhanced port 0 lid 22 lmc 0 [1]! "S-ac1"[2]! # "A1" lid 11 4xQDR [2] "S-ac2"[2] # "A2" lid 12 4xQDR [3] "H-2"[1](f2) # "Host2" lid 102 4xQDR switchguid=0xbc3(bc3)! # Line 3 Switch! 36 "S-bc3"! # "B3" enhanced port 0 lid 23 lmc 0 [1]! "S-ac1"[3]! # "A1" lid 11 4xQDR [2] "S-ac2"[3] # "A2" lid 12 4xQDR [3] "H-3"[1](f3) # "Host3" lid 103 4xQDR Page 15 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 16. Modular Switches switchguid=0xac1(ac1)! # Spine 1 Switch! 36 "S-ac1"! # "A1" enhanced port 0 lid 11 lmc 0 Chassis1 [1]! "S-bc1"[1]! # "B1" lid 21 4xQDR Spine1 Spine2 [2]! "S-bc2"[1]! # "B2" lid 22 4xQDR [3]! "S-bc3"[1]! # "B3" lid 23 4xQDR switchguid=0xac2(ac2)! # Spine 2 Switch! 36 "S-ac2"! # "A2" enhanced port 0 lid 12 lmc 0 [1]! "S-bc1"[2]! # "B1" lid 21 4xQDR [2]! "S-bc2"[2]! # "B2" lid 22 4xQDR Line1 Line2 Line3 [3]! "S-bc3"[2]! # "B3" lid 23 4xQDR switchguid=0xbc1(bc1)! # Line 1 Switch 36 "S-bc1"! # "B1" enhanced port 0 lid 21 lmc 0 [1]! "S-ac1"[1]! # "A1" lid 11 4xQDR Host1 Host2 Host3 [2] "S-ac2"[1] # "A2" lid 12 4xQDR [3] "H-1"[1](f1) # "Host1" lid 101 4xQDR switchguid=0xbc2(bc2)! # Line 2 Switch! 36 "S-bc2"! # "B2" enhanced port 0 lid 22 lmc 0 [1]! "S-ac1"[2]! # "A1" lid 11 4xQDR [2] "S-ac2"[2] # "A2" lid 12 4xQDR Chassis1 [3] "H-2"[1](f2) # "Host2" lid 102 4xQDR switchguid=0xbc3(bc3)! # Line 3 Switch! 36 "S-bc3"! # "B3" enhanced port 0 lid 23 lmc 0 [1]! "S-ac1"[3]! # "A1" lid 11 4xQDR [2] "S-ac2"[3] # "A2" lid 12 4xQDR [3] "H-3"[1](f3) # "Host3" lid 103 4xQDR Host1 Host2 Host3 Page 16 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 17. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) ▪ infiniband-diags ▪ wrapper of infiniband-diags ▪ ibcheckerrors ▪ INAM (Ohio-State-University) ▪ ibdiagpath ▪ QNIB ▪ plugin to non-IB systems ▪ ..... ▪ nagios ▪ collectl ▪ hardware vendor suites not listed stuff ▪ Unified Fabric Manager (Mellanox) ▪ ... ▪ InfiniBand Fabric Suites (QLogic) Page 17 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 18. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) ▪ infiniband-diags ▪ wrapper of infiniband-diags ▪ ibcheckerrors ▪ INAM (Ohio-State-University) ▪ ibdiagpath ▪ QNIB ▪ plugin to non-IB systems ▪ ..... ▪ nagios ▪ collectl ▪ hardware vendor suites not listed stuff ▪ Unified Fabric Manager (Mellanox) ▪ ... ▪ InfiniBand Fabric Suites (QLogic) Page 18 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 19. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) ▪ infiniband-diags ▪ wrapper of infiniband-diags ▪ ibcheckerrors ▪ INAM (Ohio-State-University) ▪ ibdiagpath ▪ QNIB ▪ plugin to non-IB systems ▪ ..... ▪ nagios ▪ collectl ▪ hardware vendor suites not listed stuff ▪ Unified Fabric Manager (Mellanox) ▪ ... ▪ InfiniBand Fabric Suites (QLogic) Page 19 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 20. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) ▪ infiniband-diags ▪ wrapper of infiniband-diags ▪ ibcheckerrors ▪ INAM (Ohio-State-University) ▪ ibdiagpath ▪ QNIB ▪ plugin to non-IB systems ▪ ..... ▪ nagios ▪ collectl ▪ hardware vendor suites not listed stuff ▪ Unified Fabric Manager (Mellanox) ▪ ... ▪ InfiniBand Fabric Suites (QLogic) Page 20 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 21. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) ▪ infiniband-diags ▪ wrapper of infiniband-diags ▪ ibcheckerrors ▪ INAM (Ohio-State-University) ▪ ibdiagpath ▪ QNIB ▪ plugin to non-IB systems ▪ ..... ▪ nagios ▪ collectl ▪ hardware vendor suites not listed stuff ▪ Unified Fabric Manager (Mellanox) ▪ ... ▪ InfiniBand Fabric Suites (QLogic) Page 21 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 22. Discussion - Quick Analysis Fabricsize Type of use ▪ small -> easy as pie? ▪ willing/forced to share ▪ big -> crit. mass for Problemkind / -amount real analysis? ▪ runs smoothly enough Switch structure Problemsolving ▪ what is your ▪ learncurve starts step routing algorithm? Focus ▪ 80:20 rule? performance maintenance Page 22 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 23. Discussion - Quick Analysis Fabric size Type of use ▪ small -> easy as pie? ▪ willing/forced to share ▪ big -> crit. mass for Problem type / amount real analysis? ▪ runs smoothly enough Switch structure Problem solving ▪ what is your ▪ learning curve starts steep routing algorithm? Focus ▪ 80:20 rule? performance maintenance Page 23 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 24. Discussion - Quick Analysis Fabric size Type of use ▪ small -> easy as pie? ▪ willing/forced to share ▪ big -> crit. mass for Problem type / amount real analysis? ▪ runs smoothly enough Switch structure Problem solving ▪ what is your ▪ learning curve starts steep routing algorithm? Focus ▪ 80:20 rule? performance maintenance Page 24 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 25. Discussion - Quick Analysis Fabric size Type of use ▪ small -> easy as pie? ▪ willing/forced to share ▪ big -> crit. mass for Problem type / amount real analysis? ▪ runs smoothly enough Switch structure Problem solving ▪ what is your ▪ learning curve starts steep routing algorithm? Focus 100 ▪ 80:20 rule? 75 performance 50 maintenance 25 Page 25 0 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 26. Discussion - Quick Analysis Fabric size Type of use ▪ small -> easy as pie? ▪ willing/forced to share ▪ big -> crit. mass for Problem type / amount real analysis? ▪ runs smoothly enough Switch structure Problem solving ▪ what is your ▪ learning curve starts steep routing algorithm? Focus ▪ 80:20 rule? performance maintenance Page 26 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 27. Discussion - Quick Analysis Fabric size Type of use ▪ small -> easy as pie? ▪ willing/forced to share ▪ big -> crit. mass for Problem type / amount real analysis? ▪ runs smoothly enough Switch structure Problem solving ▪ what is your ▪ learning curve starts steep routing algorithm? Focus ▪ 80:20 rule? performance maintenance Page 27 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 28. Discussion - Quick Analysis Fabric size Type of use ▪ small -> easy as pie? ▪ willing/forced to share ▪ big -> crit. mass for Problem type / amount real analysis? ▪ runs smoothly enough Switch structure Problem solving ▪ what is your ▪ learning curve starts steep routing algorithm? Focus ▪ 80:20 rule? performance maintenance Page 28 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 29. Discussion - Conclusions Monitoring ▪ what approach? Do we scare you? ▪ not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 29 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 30. Discussion - Conclusions Monitoring ▪ what approach? Do we scare you? ▪ not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 30 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 31. Discussion - Conclusions Monitoring ▪ what approach? Do we scare you? ▪ not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 31 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 32. Discussion - Conclusions Monitoring ▪ what approach? Do we scare you? ▪ not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 32 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 33. Discussion - Conclusions Monitoring ▪ what approach? Do we scare you? ▪ not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 33 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 34. Thank you for your attention and participation! science + computing ag www.science-computing.de Telefon: +49 (0)7071 9457 - 0 E-Mail: info@science-computing.de