SlideShare a Scribd company logo
1 of 102
Download to read offline
So you want to buy a supercomputer?

                James Davenport
Hebron & Medlock Professor of Information Technology

                University of Bath (U.K.)
                   (visiting Waterloo)


                  15 May 2009
        Many thanks to Prof. Guest (Cardiff)
University of Bath
University of Bath




       Good (9th out of 117 in the U.K.: Guardian 12 May 2009)
       Heavily co-op
       Strengths in Science, Engineering, Mathematics
University of Bath




       Good (9th out of 117 in the U.K.: Guardian 12 May 2009)
       Heavily co-op
       Strengths in Science, Engineering, Mathematics
   But small — 538 Faculty
U.K. scene — generalities




      Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
U.K. scene — generalities




      Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
      HECToR (current one) 29th in TOP 500
U.K. scene — generalities




      Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
      HECToR (current one) 29th in TOP 500
      Time bid for on competitive grants (virtual money)
U.K. scene — generalities




      Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
      HECToR (current one) 29th in TOP 500
      Time bid for on competitive grants (virtual money)
      Hence you need a ‘track record’
U.K. scene — generalities




      Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
      HECToR (current one) 29th in TOP 500
      Time bid for on competitive grants (virtual money)
      Hence you need a ‘track record’
U.K. scene — generalities




       Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
       HECToR (current one) 29th in TOP 500
       Time bid for on competitive grants (virtual money)
       Hence you need a ‘track record’
   Basically, Mark 4 v 25:
U.K. scene — generalities




       Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
       HECToR (current one) 29th in TOP 500
       Time bid for on competitive grants (virtual money)
       Hence you need a ‘track record’
   Basically, Mark 4 v 25: “to him that hath shall be given”.
U.K. scene — recent developments
U.K. scene — recent developments


      EPSRC etc. (≈ NSERC) now allow depreciation on
      computing resources to be charged to grants
      (Previously, you had to buy your own machine
U.K. scene — recent developments


      EPSRC etc. (≈ NSERC) now allow depreciation on
      computing resources to be charged to grants
      (Previously, you had to buy your own machine and run it)
U.K. scene — recent developments


      EPSRC etc. (≈ NSERC) now allow depreciation on
      computing resources to be charged to grants
      (Previously, you had to buy your own machine and run it)
      Government announce Science Research Infrastructure Fund
      (£500M/year)
      (largely buildings, but equipment not excluded)
U.K. scene — recent developments


       EPSRC etc. (≈ NSERC) now allow depreciation on
       computing resources to be charged to grants
       (Previously, you had to buy your own machine and run it)
       Government announce Science Research Infrastructure Fund
       (£500M/year)
       (largely buildings, but equipment not excluded)
       Bath share about £5M/year
  N.B. “year” = H.M. Treasury Year
U.K. scene — recent developments


       EPSRC etc. (≈ NSERC) now allow depreciation on
       computing resources to be charged to grants
       (Previously, you had to buy your own machine and run it)
       Government announce Science Research Infrastructure Fund
       (£500M/year)
       (largely buildings, but equipment not excluded)
       Bath share about £5M/year
  N.B. “year” = H.M. Treasury Year
   Brainwave: if I purchase a supercomputer, then I can depreciate it,
   and have money to buy a new one.
Recent UK spend, excluding machine rooms etc.
   compareukhpccost2009.JPG (JPEG Image, 975×621 pixels) - Scaled (90%)   http://wrgrid.group.shef.ac.uk/temp/hpc/compareukhpccost2009.JPG
Machine Rooms — a major problem




      Cardiff £1.6M on machine, £1.4M on converting machine
             room and (high-quality) air conditioning.
Machine Rooms — a major problem




      Cardiff £1.6M on machine, £1.4M on converting machine
             room and (high-quality) air conditioning.
      Bristol £2M on machine, £2M+ on building machine room
              and including chilled water.
Machine Rooms — a major problem




      Cardiff £1.6M on machine, £1.4M on converting machine
             room and (high-quality) air conditioning.
      Bristol £2M on machine, £2M+ on building machine room
              and including chilled water.
     Imperial (Central London) £3M on CO2 -cooled machine
              room.
Machine Rooms — a major problem




      Cardiff £1.6M on machine, £1.4M on converting machine
             room and (high-quality) air conditioning.
      Bristol £2M on machine, £2M+ on building machine room
              and including chilled water.
     Imperial (Central London) £3M on CO2 -cooled machine
              room.
Machine Rooms — a major problem




       Cardiff £1.6M on machine, £1.4M on converting machine
              room and (high-quality) air conditioning.
       Bristol £2M on machine, £2M+ on building machine room
               and including chilled water.
      Imperial (Central London) £3M on CO2 -cooled machine
               room.
  Bath had an old machine room from the 1970s.
Old Machine Rooms — a mixed blessing

          + I doubt very much Bath would have spent those sort
            of sums on a new machine room
Old Machine Rooms — a mixed blessing

          + I doubt very much Bath would have spent those sort
            of sums on a new machine room
          + Comparative speed: I took under a year from initial
            decision to Phase 1 installed
Old Machine Rooms — a mixed blessing

          + I doubt very much Bath would have spent those sort
            of sums on a new machine room
          + Comparative speed: I took under a year from initial
            decision to Phase 1 installed
          − It will, just about, cope with the current smallish
            machine: I think in a few years we’ll need a new
            machine room
Old Machine Rooms — a mixed blessing

          + I doubt very much Bath would have spent those sort
            of sums on a new machine room
          + Comparative speed: I took under a year from initial
            decision to Phase 1 installed
          − It will, just about, cope with the current smallish
            machine: I think in a few years we’ll need a new
            machine room
          − The University don’t realise what a bargain they’re
            getting
Old Machine Rooms — a mixed blessing

          + I doubt very much Bath would have spent those sort
            of sums on a new machine room
          + Comparative speed: I took under a year from initial
            decision to Phase 1 installed
          − It will, just about, cope with the current smallish
            machine: I think in a few years we’ll need a new
            machine room
          − The University don’t realise what a bargain they’re
            getting
          − Despite the Estates Department’s promises, the
            power supply did need upgrading
Old Machine Rooms — a mixed blessing

          + I doubt very much Bath would have spent those sort
            of sums on a new machine room
          + Comparative speed: I took under a year from initial
            decision to Phase 1 installed
          − It will, just about, cope with the current smallish
            machine: I think in a few years we’ll need a new
            machine room
          − The University don’t realise what a bargain they’re
            getting
          − Despite the Estates Department’s promises, the
            power supply did need upgrading
          + Contracts signed this week on a new machine room
            with chilled water!
Actual Timescale



      1/2007 I am tasked with looking into this
Actual Timescale



      1/2007 I am tasked with looking into this
      5/2007 Top management buys the case
Actual Timescale



      1/2007 I am tasked with looking into this
      5/2007 Top management buys the case
Actual Timescale



       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case
   So what was the case?
       Researchers think they can support £450K of equipment
Actual Timescale



       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case
   So what was the case?
       Researchers think they can support £450K of equipment
       (i.e. earn that much depreciation over 3 years)
Actual Timescale



       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case
   So what was the case?
       Researchers think they can support £450K of equipment
       (i.e. earn that much depreciation over 3 years)
       6 year commitment with 2-year reviews/refreshes
Actual Timescale



       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case
   So what was the case?
       Researchers think they can support £450K of equipment
       (i.e. earn that much depreciation over 3 years)
       6 year commitment with 2-year reviews/refreshes
   So 4 years warning of decommitment
Actual Timescale

      1/2007 I am tasked with looking into this
Actual Timescale

      1/2007 I am tasked with looking into this
      5/2007 Top management buys the case: RFP for £360K
Actual Timescale

      1/2007 I am tasked with looking into this
      5/2007 Top management buys the case: RFP for £360K
           * There was already a national pre-qualified list
Actual Timescale

      1/2007 I am tasked with looking into this
      5/2007 Top management buys the case: RFP for £360K
           * There was already a national pre-qualified list
      9/2007 “So what’s your final offer?”
Actual Timescale

       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case: RFP for £360K
            * There was already a national pre-qualified list
       9/2007 “So what’s your final offer?”
      10/2007 Purchase decision
Actual Timescale

       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case: RFP for £360K
            * There was already a national pre-qualified list
       9/2007 “So what’s your final offer?”
      10/2007 Purchase decision
       1/2008 Phase 1 delivery
Actual Timescale

       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case: RFP for £360K
            * There was already a national pre-qualified list
       9/2007 “So what’s your final offer?”
      10/2007 Purchase decision
       1/2008 Phase 1 delivery
       3/2008 Phase 1 acceptance
Actual Timescale

       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case: RFP for £360K
            * There was already a national pre-qualified list
       9/2007 “So what’s your final offer?”
      10/2007 Purchase decision
       1/2008 Phase 1 delivery
       3/2008 Phase 1 acceptance
            • UK Treasury FY ends 5 April!
Actual Timescale

       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case: RFP for £360K
            * There was already a national pre-qualified list
       9/2007 “So what’s your final offer?”
      10/2007 Purchase decision
       1/2008 Phase 1 delivery
       3/2008 Phase 1 acceptance
            • UK Treasury FY ends 5 April!
      10/2008 Phase 2 decision (not to delay)
Actual Timescale

       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case: RFP for £360K
            * There was already a national pre-qualified list
       9/2007 “So what’s your final offer?”
      10/2007 Purchase decision
       1/2008 Phase 1 delivery
       3/2008 Phase 1 acceptance
            • UK Treasury FY ends 5 April!
      10/2008 Phase 2 decision (not to delay)
       1/2009 Phase 2 delivery
Actual Timescale

       1/2007 I am tasked with looking into this
       5/2007 Top management buys the case: RFP for £360K
            * There was already a national pre-qualified list
       9/2007 “So what’s your final offer?”
      10/2007 Purchase decision
       1/2008 Phase 1 delivery
       3/2008 Phase 1 acceptance
            • UK Treasury FY ends 5 April!
      10/2008 Phase 2 decision (not to delay)
       1/2009 Phase 2 delivery
       5/2009 Acceptance
Equipment Purchased
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
      100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
      100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
      (3.0 gave less power/£; 2.66 pushed the power envelope)
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
      100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
      (3.0 gave less power/£; 2.66 pushed the power envelope)
      2 nodes/power supply
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
      100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
      (3.0 gave less power/£; 2.66 pushed the power envelope)
      2 nodes/power supply
      2GB/core main memory
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
      100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
      (3.0 gave less power/£; 2.66 pushed the power envelope)
      2 nodes/power supply
      2GB/core main memory
    * Specified this way as 2/4 core wasn’t obvious
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
      100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
      (3.0 gave less power/£; 2.66 pushed the power envelope)
      2 nodes/power supply
      2GB/core main memory
    * Specified this way as 2/4 core wasn’t obvious
    = 1.6TB main memory — it adds up!
Equipment Purchased


  Clustervision: a UK/Dutch firm of system integrators: the boards
  are Supermicro.
      100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
      (3.0 gave less power/£; 2.66 pushed the power envelope)
      2 nodes/power supply
      2GB/core main memory
    * Specified this way as 2/4 core wasn’t obvious
    = 1.6TB main memory — it adds up!
      Double Data Rate Infiniband
Acceptance Tests



    1   Phase 1: Linpack benchmark
Acceptance Tests



    1   Phase 1: Linpack benchmark
            We had linear algebra compiled for the previous chip!
Acceptance Tests



    1   Phase 1: Linpack benchmark
            We had linear algebra compiled for the previous chip!
    2   Phase 2: a range of tests related to major users
Acceptance Tests



    1   Phase 1: Linpack benchmark
            We had linear algebra compiled for the previous chip!
    2 Phase 2: a range of tests related to major users
    * Very grateful to Prof. Guest for organising
Acceptance Tests



    1   Phase 1: Linpack benchmark
            We had linear algebra compiled for the previous chip!
    2 Phase 2: a range of tests related to major users
    * Very grateful to Prof. Guest for organising
            MPI defaults were badly wrong
Acceptance Tests



    1   Phase 1: Linpack benchmark
            We had linear algebra compiled for the previous chip!
    2 Phase 2: a range of tests related to major users
    * Very grateful to Prof. Guest for organising
            MPI defaults were badly wrong
            DDR Infiniband was running out of steam faster than expected
Acceptance Tests



    1   Phase 1: Linpack benchmark
            We had linear algebra compiled for the previous chip!
    2 Phase 2: a range of tests related to major users
    * Very grateful to Prof. Guest for organising
            MPI defaults were badly wrong
            DDR Infiniband was running out of steam faster than expected
            Several partial failures.
Partial Failures
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
       Observe this is happening, and feel very confused
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
       Observe this is happening, and feel very confused
       Eventually spot that it happens when node 78 is used!
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
       Observe this is happening, and feel very confused
       Eventually spot that it happens when node 78 is used!
       Convince the manufacturer to run their tests on node 78
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
       Observe this is happening, and feel very confused
       Eventually spot that it happens when node 78 is used!
       Convince the manufacturer to run their tests on node 78
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
       Observe this is happening, and feel very confused
       Eventually spot that it happens when node 78 is used!
       Convince the manufacturer to run their tests on node 78
   Failure modes
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
         Observe this is happening, and feel very confused
         Eventually spot that it happens when node 78 is used!
         Convince the manufacturer to run their tests on node 78
   Failure modes
     1   Node 78 (and another one since) — poor Infiniband
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
         Observe this is happening, and feel very confused
         Eventually spot that it happens when node 78 is used!
         Convince the manufacturer to run their tests on node 78
   Failure modes
     1   Node 78 (and another one since) — poor Infiniband
     2   twice so far: a node loses 4GB of memory on a reboot
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
         Observe this is happening, and feel very confused
         Eventually spot that it happens when node 78 is used!
         Convince the manufacturer to run their tests on node 78
   Failure modes
     1   Node 78 (and another one since) — poor Infiniband
     2   twice so far: a node loses 4GB of memory on a reboot
     3   Others?
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
         Observe this is happening, and feel very confused
         Eventually spot that it happens when node 78 is used!
         Convince the manufacturer to run their tests on node 78
   Failure modes
     1   Node 78 (and another one since) — poor Infiniband
     2   twice so far: a node loses 4GB of memory on a reboot
     3   Others?
Partial Failures


   Very frustrating and hard to diagnose: typically one job would take
   “longer than expected”.
         Observe this is happening, and feel very confused
         Eventually spot that it happens when node 78 is used!
         Convince the manufacturer to run their tests on node 78
   Failure modes
     1   Node 78 (and another one since) — poor Infiniband
     2   twice so far: a node loses 4GB of memory on a reboot
     3   Others?
   “One footsore soldier can delay a regiment” — Duke of Wellington
Lessons I already knew




      Get it in writing from Estates.
Lessons I already knew




      Get it in writing from Estates.
      Know your (potential) users early
Lessons I already knew




      Get it in writing from Estates.
      Know your (potential) users early
      (devise acceptance tests accordingly)
Lessons I already knew




      Get it in writing from Estates.
      Know your (potential) users early
      (devise acceptance tests accordingly)
      It’s hard to explain to management
Lessons I know now




      It’s very hard to explain to management
Lessons I know now




      It’s very hard to explain to management
      Acceptance tests are very important, especially
Lessons I know now




      It’s very hard to explain to management
      Acceptance tests are very important, especially
      Car-Parrinello Molecular Dynamics (CPMD) for interconnect
Lessons I know now




      It’s very hard to explain to management
      Acceptance tests are very important, especially
      Car-Parrinello Molecular Dynamics (CPMD) for interconnect
      Partial failure is far worse than total failure
Lessons I know now




      It’s very hard to explain to management
      Acceptance tests are very important, especially
      Car-Parrinello Molecular Dynamics (CPMD) for interconnect
      Partial failure is far worse than total failure
      Even DDR Infiniband has trouble with 8 cores/node
Lessons I know now




      It’s very hard to explain to management
      Acceptance tests are very important, especially
      Car-Parrinello Molecular Dynamics (CPMD) for interconnect
      Partial failure is far worse than total failure
      Even DDR Infiniband has trouble with 8 cores/node
      (There’s a good paper (now !) by HP)
Lessons I know I still don’t know



       Good ways of detecting partial failure
Lessons I know I still don’t know



       Good ways of detecting partial failure
       How to manage software licencing if you can’t afford to
       licence every node
Lessons I know I still don’t know



       Good ways of detecting partial failure
       How to manage software licencing if you can’t afford to
       licence every node
       How to persuade management to deliver on the promised
       refreshes
Lessons I know I still don’t know



       Good ways of detecting partial failure
       How to manage software licencing if you can’t afford to
       licence every node
       How to persuade management to deliver on the promised
       refreshes
       Will the assumptions hold up:
Lessons I know I still don’t know



       Good ways of detecting partial failure
       How to manage software licencing if you can’t afford to
       licence every node
       How to persuade management to deliver on the promised
       refreshes
       Will the assumptions hold up:
           Assumptions on grant-getting
Lessons I know I still don’t know



       Good ways of detecting partial failure
       How to manage software licencing if you can’t afford to
       licence every node
       How to persuade management to deliver on the promised
       refreshes
       Will the assumptions hold up:
           Assumptions on grant-getting
           Assumptions on actual usage ⇒ price/hour
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
      The Maui scheduler has (too?) many knobs in this area
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
      The Maui scheduler has (too?) many knobs in this area
         48% Equipment depreciation
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
      The Maui scheduler has (too?) many knobs in this area
         48% Equipment depreciation
         15% Equipment maintenance
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
      The Maui scheduler has (too?) many knobs in this area
         48% Equipment depreciation
         15% Equipment maintenance
         10% Machine electricity
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
      The Maui scheduler has (too?) many knobs in this area
         48% Equipment depreciation
         15% Equipment maintenance
         10% Machine electricity
          8% Air conditioning (incl. depreciation)
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
      The Maui scheduler has (too?) many knobs in this area
         48% Equipment depreciation
         15% Equipment maintenance
         10% Machine electricity
          8% Air conditioning (incl. depreciation)
         17% 1 Programmer (1/3 of team of 3)
Price per node hour: 52p≈CAN$0.9

      With the exception of a “short test” queue, allocation is
      based on whole nodes.
      Allocation is based on entitlements rather than retrospective
      billing
      The Maui scheduler has (too?) many knobs in this area
         48% Equipment depreciation
         15% Equipment maintenance
         10% Machine electricity
          8% Air conditioning (incl. depreciation)
         17% 1 Programmer (1/3 of team of 3)
          2% My time
Lessons I don’t know I don’t know?
Lessons I don’t know I don’t know?




   Any questions?

More Related Content

Viewers also liked

Stephen Covey 9010
Stephen Covey   9010Stephen Covey   9010
Stephen Covey 9010vladgliga
 
Управление командой аналитиков
Управление командой аналитиковУправление командой аналитиков
Управление командой аналитиковGrigoriy Pechenkin
 
iaa 2009 + vicente perez, mikel larios, mikel sanz
iaa 2009 + vicente perez, mikel larios, mikel sanziaa 2009 + vicente perez, mikel larios, mikel sanz
iaa 2009 + vicente perez, mikel larios, mikel sanzvicente46
 
How do you set your price
How do you set your priceHow do you set your price
How do you set your priceFrances Kazan
 
Shape 2013 developing multi targeting windows store and windows phone apps
Shape 2013   developing multi targeting windows store and windows phone appsShape 2013   developing multi targeting windows store and windows phone apps
Shape 2013 developing multi targeting windows store and windows phone appsJose Luis Latorre Millas
 
Podcast Your Passion: 12 Steps to Mastery
Podcast Your Passion: 12 Steps to MasteryPodcast Your Passion: 12 Steps to Mastery
Podcast Your Passion: 12 Steps to MasteryLen Edgerly
 
How to succeed as VP Public Relations
How to succeed as VP Public RelationsHow to succeed as VP Public Relations
How to succeed as VP Public RelationsFrances Kazan
 
Starten met Infobright
Starten met InfobrightStarten met Infobright
Starten met InfobrightDaan Blinde
 
Lawyers
LawyersLawyers
Lawyersmtoto
 
Svíþjóð
SvíþjóðSvíþjóð
Svíþjóðjanusg
 
Surtsey
SurtseySurtsey
Surtseyjanusg
 
Daily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay AreaDaily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay Areaguest77208866
 
Hallgrimu Petursson
Hallgrimu PeturssonHallgrimu Petursson
Hallgrimu Peturssonjanusg
 
Fuglar
FuglarFuglar
Fuglarjanusg
 

Viewers also liked (19)

Bakirova
BakirovaBakirova
Bakirova
 
Stephen Covey 9010
Stephen Covey   9010Stephen Covey   9010
Stephen Covey 9010
 
Travades
TravadesTravades
Travades
 
Управление командой аналитиков
Управление командой аналитиковУправление командой аналитиков
Управление командой аналитиков
 
iaa 2009 + vicente perez, mikel larios, mikel sanz
iaa 2009 + vicente perez, mikel larios, mikel sanziaa 2009 + vicente perez, mikel larios, mikel sanz
iaa 2009 + vicente perez, mikel larios, mikel sanz
 
How do you set your price
How do you set your priceHow do you set your price
How do you set your price
 
Shape 2013 developing multi targeting windows store and windows phone apps
Shape 2013   developing multi targeting windows store and windows phone appsShape 2013   developing multi targeting windows store and windows phone apps
Shape 2013 developing multi targeting windows store and windows phone apps
 
Podcast Your Passion: 12 Steps to Mastery
Podcast Your Passion: 12 Steps to MasteryPodcast Your Passion: 12 Steps to Mastery
Podcast Your Passion: 12 Steps to Mastery
 
How to succeed as VP Public Relations
How to succeed as VP Public RelationsHow to succeed as VP Public Relations
How to succeed as VP Public Relations
 
Starten met Infobright
Starten met InfobrightStarten met Infobright
Starten met Infobright
 
Lawyers
LawyersLawyers
Lawyers
 
Back to the future
Back to the futureBack to the future
Back to the future
 
Svíþjóð
SvíþjóðSvíþjóð
Svíþjóð
 
Surtsey
SurtseySurtsey
Surtsey
 
2009.05 CRM Quidgest - Jose Torres
2009.05 CRM Quidgest - Jose Torres2009.05 CRM Quidgest - Jose Torres
2009.05 CRM Quidgest - Jose Torres
 
Daily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay AreaDaily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay Area
 
Hallgrimu Petursson
Hallgrimu PeturssonHallgrimu Petursson
Hallgrimu Petursson
 
Fuglar
FuglarFuglar
Fuglar
 
NúMeros En IngléS
NúMeros En IngléSNúMeros En IngléS
NúMeros En IngléS
 

Similar to So you want to buy a supercomputer?

- Attached exhibits which are readable and understandable (I sugge.docx
- Attached exhibits which are readable and understandable (I sugge.docx- Attached exhibits which are readable and understandable (I sugge.docx
- Attached exhibits which are readable and understandable (I sugge.docxmercysuttle
 
Heat recovery Ventilation on Low Carbon Housing
Heat recovery Ventilation on Low Carbon HousingHeat recovery Ventilation on Low Carbon Housing
Heat recovery Ventilation on Low Carbon HousingNuTech Renewables Ltd
 
Craven County Wind Energy
Craven County Wind EnergyCraven County Wind Energy
Craven County Wind EnergyJohn Droz
 
ALTERNATIVE ENERGY RESOUURCE(SEMINAR)
ALTERNATIVE ENERGY RESOUURCE(SEMINAR)ALTERNATIVE ENERGY RESOUURCE(SEMINAR)
ALTERNATIVE ENERGY RESOUURCE(SEMINAR)Devendra Mane
 
Technical Machine's Hardware Playbook
Technical Machine's Hardware PlaybookTechnical Machine's Hardware Playbook
Technical Machine's Hardware PlaybookTechnicalMachine
 
RAH File Tesca Works Deliverables To BUS 635 Orange .docx
RAH File  Tesca Works Deliverables To  BUS 635 Orange .docxRAH File  Tesca Works Deliverables To  BUS 635 Orange .docx
RAH File Tesca Works Deliverables To BUS 635 Orange .docxcatheryncouper
 
Acct 434 week 4 midterm exam(uophelp)
Acct 434 week 4 midterm exam(uophelp)Acct 434 week 4 midterm exam(uophelp)
Acct 434 week 4 midterm exam(uophelp)user3443
 
Acct 434 week 4 midterm exam (devry)
Acct 434 week 4 midterm exam (devry)Acct 434 week 4 midterm exam (devry)
Acct 434 week 4 midterm exam (devry)slicespiece
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
 
20150811 leviathan wind energizer investors
20150811 leviathan wind energizer   investors20150811 leviathan wind energizer   investors
20150811 leviathan wind energizer investorsdanielfarb
 
Business Canvas and SWOT Analysis For mo.docx
             Business Canvas and SWOT Analysis   For mo.docx             Business Canvas and SWOT Analysis   For mo.docx
Business Canvas and SWOT Analysis For mo.docxhallettfaustina
 
Coates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceCoates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceBOSC 2010
 
Business Decision Making Exam Help
Business Decision Making Exam HelpBusiness Decision Making Exam Help
Business Decision Making Exam HelpEconomics Exam Help
 
| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...
| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...
| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...Ahmad Hassan
 

Similar to So you want to buy a supercomputer? (20)

- Attached exhibits which are readable and understandable (I sugge.docx
- Attached exhibits which are readable and understandable (I sugge.docx- Attached exhibits which are readable and understandable (I sugge.docx
- Attached exhibits which are readable and understandable (I sugge.docx
 
Heat recovery Ventilation on Low Carbon Housing
Heat recovery Ventilation on Low Carbon HousingHeat recovery Ventilation on Low Carbon Housing
Heat recovery Ventilation on Low Carbon Housing
 
Craven County Wind Energy
Craven County Wind EnergyCraven County Wind Energy
Craven County Wind Energy
 
ALTERNATIVE ENERGY RESOUURCE(SEMINAR)
ALTERNATIVE ENERGY RESOUURCE(SEMINAR)ALTERNATIVE ENERGY RESOUURCE(SEMINAR)
ALTERNATIVE ENERGY RESOUURCE(SEMINAR)
 
Technical Machine's Hardware Playbook
Technical Machine's Hardware PlaybookTechnical Machine's Hardware Playbook
Technical Machine's Hardware Playbook
 
RAH File Tesca Works Deliverables To BUS 635 Orange .docx
RAH File  Tesca Works Deliverables To  BUS 635 Orange .docxRAH File  Tesca Works Deliverables To  BUS 635 Orange .docx
RAH File Tesca Works Deliverables To BUS 635 Orange .docx
 
Science Cafe V4
Science  Cafe V4Science  Cafe V4
Science Cafe V4
 
Acct 434 week 4 midterm exam(uophelp)
Acct 434 week 4 midterm exam(uophelp)Acct 434 week 4 midterm exam(uophelp)
Acct 434 week 4 midterm exam(uophelp)
 
New Energy Part 3D-1 Electrical Power from the Quantum Vacuum
New Energy Part 3D-1 Electrical Power from the Quantum VacuumNew Energy Part 3D-1 Electrical Power from the Quantum Vacuum
New Energy Part 3D-1 Electrical Power from the Quantum Vacuum
 
Energy Presenation
Energy PresenationEnergy Presenation
Energy Presenation
 
Business plan
Business planBusiness plan
Business plan
 
Acct 434 week 4 midterm exam (devry)
Acct 434 week 4 midterm exam (devry)Acct 434 week 4 midterm exam (devry)
Acct 434 week 4 midterm exam (devry)
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Printing in 3D
Printing in 3DPrinting in 3D
Printing in 3D
 
20150811 leviathan wind energizer investors
20150811 leviathan wind energizer   investors20150811 leviathan wind energizer   investors
20150811 leviathan wind energizer investors
 
Brick making-business-guide
Brick making-business-guideBrick making-business-guide
Brick making-business-guide
 
Business Canvas and SWOT Analysis For mo.docx
             Business Canvas and SWOT Analysis   For mo.docx             Business Canvas and SWOT Analysis   For mo.docx
Business Canvas and SWOT Analysis For mo.docx
 
Coates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceCoates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substance
 
Business Decision Making Exam Help
Business Decision Making Exam HelpBusiness Decision Making Exam Help
Business Decision Making Exam Help
 
| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...
| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...
| Managerial Accounting | Chapter 3 | Systems Design: Job-Order Costing | Int...
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

So you want to buy a supercomputer?

  • 1. So you want to buy a supercomputer? James Davenport Hebron & Medlock Professor of Information Technology University of Bath (U.K.) (visiting Waterloo) 15 May 2009 Many thanks to Prof. Guest (Cardiff)
  • 3. University of Bath Good (9th out of 117 in the U.K.: Guardian 12 May 2009) Heavily co-op Strengths in Science, Engineering, Mathematics
  • 4. University of Bath Good (9th out of 117 in the U.K.: Guardian 12 May 2009) Heavily co-op Strengths in Science, Engineering, Mathematics But small — 538 Faculty
  • 5. U.K. scene — generalities Nationally run (EPSRC etc. ≈ NSERC) major supercomputers
  • 6. U.K. scene — generalities Nationally run (EPSRC etc. ≈ NSERC) major supercomputers HECToR (current one) 29th in TOP 500
  • 7. U.K. scene — generalities Nationally run (EPSRC etc. ≈ NSERC) major supercomputers HECToR (current one) 29th in TOP 500 Time bid for on competitive grants (virtual money)
  • 8. U.K. scene — generalities Nationally run (EPSRC etc. ≈ NSERC) major supercomputers HECToR (current one) 29th in TOP 500 Time bid for on competitive grants (virtual money) Hence you need a ‘track record’
  • 9. U.K. scene — generalities Nationally run (EPSRC etc. ≈ NSERC) major supercomputers HECToR (current one) 29th in TOP 500 Time bid for on competitive grants (virtual money) Hence you need a ‘track record’
  • 10. U.K. scene — generalities Nationally run (EPSRC etc. ≈ NSERC) major supercomputers HECToR (current one) 29th in TOP 500 Time bid for on competitive grants (virtual money) Hence you need a ‘track record’ Basically, Mark 4 v 25:
  • 11. U.K. scene — generalities Nationally run (EPSRC etc. ≈ NSERC) major supercomputers HECToR (current one) 29th in TOP 500 Time bid for on competitive grants (virtual money) Hence you need a ‘track record’ Basically, Mark 4 v 25: “to him that hath shall be given”.
  • 12. U.K. scene — recent developments
  • 13. U.K. scene — recent developments EPSRC etc. (≈ NSERC) now allow depreciation on computing resources to be charged to grants (Previously, you had to buy your own machine
  • 14. U.K. scene — recent developments EPSRC etc. (≈ NSERC) now allow depreciation on computing resources to be charged to grants (Previously, you had to buy your own machine and run it)
  • 15. U.K. scene — recent developments EPSRC etc. (≈ NSERC) now allow depreciation on computing resources to be charged to grants (Previously, you had to buy your own machine and run it) Government announce Science Research Infrastructure Fund (£500M/year) (largely buildings, but equipment not excluded)
  • 16. U.K. scene — recent developments EPSRC etc. (≈ NSERC) now allow depreciation on computing resources to be charged to grants (Previously, you had to buy your own machine and run it) Government announce Science Research Infrastructure Fund (£500M/year) (largely buildings, but equipment not excluded) Bath share about £5M/year N.B. “year” = H.M. Treasury Year
  • 17. U.K. scene — recent developments EPSRC etc. (≈ NSERC) now allow depreciation on computing resources to be charged to grants (Previously, you had to buy your own machine and run it) Government announce Science Research Infrastructure Fund (£500M/year) (largely buildings, but equipment not excluded) Bath share about £5M/year N.B. “year” = H.M. Treasury Year Brainwave: if I purchase a supercomputer, then I can depreciate it, and have money to buy a new one.
  • 18. Recent UK spend, excluding machine rooms etc. compareukhpccost2009.JPG (JPEG Image, 975×621 pixels) - Scaled (90%) http://wrgrid.group.shef.ac.uk/temp/hpc/compareukhpccost2009.JPG
  • 19. Machine Rooms — a major problem Cardiff £1.6M on machine, £1.4M on converting machine room and (high-quality) air conditioning.
  • 20. Machine Rooms — a major problem Cardiff £1.6M on machine, £1.4M on converting machine room and (high-quality) air conditioning. Bristol £2M on machine, £2M+ on building machine room and including chilled water.
  • 21. Machine Rooms — a major problem Cardiff £1.6M on machine, £1.4M on converting machine room and (high-quality) air conditioning. Bristol £2M on machine, £2M+ on building machine room and including chilled water. Imperial (Central London) £3M on CO2 -cooled machine room.
  • 22. Machine Rooms — a major problem Cardiff £1.6M on machine, £1.4M on converting machine room and (high-quality) air conditioning. Bristol £2M on machine, £2M+ on building machine room and including chilled water. Imperial (Central London) £3M on CO2 -cooled machine room.
  • 23. Machine Rooms — a major problem Cardiff £1.6M on machine, £1.4M on converting machine room and (high-quality) air conditioning. Bristol £2M on machine, £2M+ on building machine room and including chilled water. Imperial (Central London) £3M on CO2 -cooled machine room. Bath had an old machine room from the 1970s.
  • 24. Old Machine Rooms — a mixed blessing + I doubt very much Bath would have spent those sort of sums on a new machine room
  • 25. Old Machine Rooms — a mixed blessing + I doubt very much Bath would have spent those sort of sums on a new machine room + Comparative speed: I took under a year from initial decision to Phase 1 installed
  • 26. Old Machine Rooms — a mixed blessing + I doubt very much Bath would have spent those sort of sums on a new machine room + Comparative speed: I took under a year from initial decision to Phase 1 installed − It will, just about, cope with the current smallish machine: I think in a few years we’ll need a new machine room
  • 27. Old Machine Rooms — a mixed blessing + I doubt very much Bath would have spent those sort of sums on a new machine room + Comparative speed: I took under a year from initial decision to Phase 1 installed − It will, just about, cope with the current smallish machine: I think in a few years we’ll need a new machine room − The University don’t realise what a bargain they’re getting
  • 28. Old Machine Rooms — a mixed blessing + I doubt very much Bath would have spent those sort of sums on a new machine room + Comparative speed: I took under a year from initial decision to Phase 1 installed − It will, just about, cope with the current smallish machine: I think in a few years we’ll need a new machine room − The University don’t realise what a bargain they’re getting − Despite the Estates Department’s promises, the power supply did need upgrading
  • 29. Old Machine Rooms — a mixed blessing + I doubt very much Bath would have spent those sort of sums on a new machine room + Comparative speed: I took under a year from initial decision to Phase 1 installed − It will, just about, cope with the current smallish machine: I think in a few years we’ll need a new machine room − The University don’t realise what a bargain they’re getting − Despite the Estates Department’s promises, the power supply did need upgrading + Contracts signed this week on a new machine room with chilled water!
  • 30. Actual Timescale 1/2007 I am tasked with looking into this
  • 31. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case
  • 32. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case
  • 33. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case So what was the case? Researchers think they can support £450K of equipment
  • 34. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case So what was the case? Researchers think they can support £450K of equipment (i.e. earn that much depreciation over 3 years)
  • 35. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case So what was the case? Researchers think they can support £450K of equipment (i.e. earn that much depreciation over 3 years) 6 year commitment with 2-year reviews/refreshes
  • 36. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case So what was the case? Researchers think they can support £450K of equipment (i.e. earn that much depreciation over 3 years) 6 year commitment with 2-year reviews/refreshes So 4 years warning of decommitment
  • 37. Actual Timescale 1/2007 I am tasked with looking into this
  • 38. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K
  • 39. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list
  • 40. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?”
  • 41. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?” 10/2007 Purchase decision
  • 42. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?” 10/2007 Purchase decision 1/2008 Phase 1 delivery
  • 43. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?” 10/2007 Purchase decision 1/2008 Phase 1 delivery 3/2008 Phase 1 acceptance
  • 44. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?” 10/2007 Purchase decision 1/2008 Phase 1 delivery 3/2008 Phase 1 acceptance • UK Treasury FY ends 5 April!
  • 45. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?” 10/2007 Purchase decision 1/2008 Phase 1 delivery 3/2008 Phase 1 acceptance • UK Treasury FY ends 5 April! 10/2008 Phase 2 decision (not to delay)
  • 46. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?” 10/2007 Purchase decision 1/2008 Phase 1 delivery 3/2008 Phase 1 acceptance • UK Treasury FY ends 5 April! 10/2008 Phase 2 decision (not to delay) 1/2009 Phase 2 delivery
  • 47. Actual Timescale 1/2007 I am tasked with looking into this 5/2007 Top management buys the case: RFP for £360K * There was already a national pre-qualified list 9/2007 “So what’s your final offer?” 10/2007 Purchase decision 1/2008 Phase 1 delivery 3/2008 Phase 1 acceptance • UK Treasury FY ends 5 April! 10/2008 Phase 2 decision (not to delay) 1/2009 Phase 2 delivery 5/2009 Acceptance
  • 49. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro.
  • 50. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro. 100 nodes; 2 × 4-core 2.8GHz Intel Harpertown
  • 51. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro. 100 nodes; 2 × 4-core 2.8GHz Intel Harpertown (3.0 gave less power/£; 2.66 pushed the power envelope)
  • 52. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro. 100 nodes; 2 × 4-core 2.8GHz Intel Harpertown (3.0 gave less power/£; 2.66 pushed the power envelope) 2 nodes/power supply
  • 53. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro. 100 nodes; 2 × 4-core 2.8GHz Intel Harpertown (3.0 gave less power/£; 2.66 pushed the power envelope) 2 nodes/power supply 2GB/core main memory
  • 54. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro. 100 nodes; 2 × 4-core 2.8GHz Intel Harpertown (3.0 gave less power/£; 2.66 pushed the power envelope) 2 nodes/power supply 2GB/core main memory * Specified this way as 2/4 core wasn’t obvious
  • 55. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro. 100 nodes; 2 × 4-core 2.8GHz Intel Harpertown (3.0 gave less power/£; 2.66 pushed the power envelope) 2 nodes/power supply 2GB/core main memory * Specified this way as 2/4 core wasn’t obvious = 1.6TB main memory — it adds up!
  • 56. Equipment Purchased Clustervision: a UK/Dutch firm of system integrators: the boards are Supermicro. 100 nodes; 2 × 4-core 2.8GHz Intel Harpertown (3.0 gave less power/£; 2.66 pushed the power envelope) 2 nodes/power supply 2GB/core main memory * Specified this way as 2/4 core wasn’t obvious = 1.6TB main memory — it adds up! Double Data Rate Infiniband
  • 57. Acceptance Tests 1 Phase 1: Linpack benchmark
  • 58. Acceptance Tests 1 Phase 1: Linpack benchmark We had linear algebra compiled for the previous chip!
  • 59. Acceptance Tests 1 Phase 1: Linpack benchmark We had linear algebra compiled for the previous chip! 2 Phase 2: a range of tests related to major users
  • 60. Acceptance Tests 1 Phase 1: Linpack benchmark We had linear algebra compiled for the previous chip! 2 Phase 2: a range of tests related to major users * Very grateful to Prof. Guest for organising
  • 61. Acceptance Tests 1 Phase 1: Linpack benchmark We had linear algebra compiled for the previous chip! 2 Phase 2: a range of tests related to major users * Very grateful to Prof. Guest for organising MPI defaults were badly wrong
  • 62. Acceptance Tests 1 Phase 1: Linpack benchmark We had linear algebra compiled for the previous chip! 2 Phase 2: a range of tests related to major users * Very grateful to Prof. Guest for organising MPI defaults were badly wrong DDR Infiniband was running out of steam faster than expected
  • 63. Acceptance Tests 1 Phase 1: Linpack benchmark We had linear algebra compiled for the previous chip! 2 Phase 2: a range of tests related to major users * Very grateful to Prof. Guest for organising MPI defaults were badly wrong DDR Infiniband was running out of steam faster than expected Several partial failures.
  • 65. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”.
  • 66. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused
  • 67. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used!
  • 68. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78
  • 69. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78
  • 70. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78 Failure modes
  • 71. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78 Failure modes 1 Node 78 (and another one since) — poor Infiniband
  • 72. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78 Failure modes 1 Node 78 (and another one since) — poor Infiniband 2 twice so far: a node loses 4GB of memory on a reboot
  • 73. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78 Failure modes 1 Node 78 (and another one since) — poor Infiniband 2 twice so far: a node loses 4GB of memory on a reboot 3 Others?
  • 74. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78 Failure modes 1 Node 78 (and another one since) — poor Infiniband 2 twice so far: a node loses 4GB of memory on a reboot 3 Others?
  • 75. Partial Failures Very frustrating and hard to diagnose: typically one job would take “longer than expected”. Observe this is happening, and feel very confused Eventually spot that it happens when node 78 is used! Convince the manufacturer to run their tests on node 78 Failure modes 1 Node 78 (and another one since) — poor Infiniband 2 twice so far: a node loses 4GB of memory on a reboot 3 Others? “One footsore soldier can delay a regiment” — Duke of Wellington
  • 76. Lessons I already knew Get it in writing from Estates.
  • 77. Lessons I already knew Get it in writing from Estates. Know your (potential) users early
  • 78. Lessons I already knew Get it in writing from Estates. Know your (potential) users early (devise acceptance tests accordingly)
  • 79. Lessons I already knew Get it in writing from Estates. Know your (potential) users early (devise acceptance tests accordingly) It’s hard to explain to management
  • 80. Lessons I know now It’s very hard to explain to management
  • 81. Lessons I know now It’s very hard to explain to management Acceptance tests are very important, especially
  • 82. Lessons I know now It’s very hard to explain to management Acceptance tests are very important, especially Car-Parrinello Molecular Dynamics (CPMD) for interconnect
  • 83. Lessons I know now It’s very hard to explain to management Acceptance tests are very important, especially Car-Parrinello Molecular Dynamics (CPMD) for interconnect Partial failure is far worse than total failure
  • 84. Lessons I know now It’s very hard to explain to management Acceptance tests are very important, especially Car-Parrinello Molecular Dynamics (CPMD) for interconnect Partial failure is far worse than total failure Even DDR Infiniband has trouble with 8 cores/node
  • 85. Lessons I know now It’s very hard to explain to management Acceptance tests are very important, especially Car-Parrinello Molecular Dynamics (CPMD) for interconnect Partial failure is far worse than total failure Even DDR Infiniband has trouble with 8 cores/node (There’s a good paper (now !) by HP)
  • 86. Lessons I know I still don’t know Good ways of detecting partial failure
  • 87. Lessons I know I still don’t know Good ways of detecting partial failure How to manage software licencing if you can’t afford to licence every node
  • 88. Lessons I know I still don’t know Good ways of detecting partial failure How to manage software licencing if you can’t afford to licence every node How to persuade management to deliver on the promised refreshes
  • 89. Lessons I know I still don’t know Good ways of detecting partial failure How to manage software licencing if you can’t afford to licence every node How to persuade management to deliver on the promised refreshes Will the assumptions hold up:
  • 90. Lessons I know I still don’t know Good ways of detecting partial failure How to manage software licencing if you can’t afford to licence every node How to persuade management to deliver on the promised refreshes Will the assumptions hold up: Assumptions on grant-getting
  • 91. Lessons I know I still don’t know Good ways of detecting partial failure How to manage software licencing if you can’t afford to licence every node How to persuade management to deliver on the promised refreshes Will the assumptions hold up: Assumptions on grant-getting Assumptions on actual usage ⇒ price/hour
  • 92. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes.
  • 93. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing
  • 94. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing The Maui scheduler has (too?) many knobs in this area
  • 95. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing The Maui scheduler has (too?) many knobs in this area 48% Equipment depreciation
  • 96. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing The Maui scheduler has (too?) many knobs in this area 48% Equipment depreciation 15% Equipment maintenance
  • 97. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing The Maui scheduler has (too?) many knobs in this area 48% Equipment depreciation 15% Equipment maintenance 10% Machine electricity
  • 98. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing The Maui scheduler has (too?) many knobs in this area 48% Equipment depreciation 15% Equipment maintenance 10% Machine electricity 8% Air conditioning (incl. depreciation)
  • 99. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing The Maui scheduler has (too?) many knobs in this area 48% Equipment depreciation 15% Equipment maintenance 10% Machine electricity 8% Air conditioning (incl. depreciation) 17% 1 Programmer (1/3 of team of 3)
  • 100. Price per node hour: 52p≈CAN$0.9 With the exception of a “short test” queue, allocation is based on whole nodes. Allocation is based on entitlements rather than retrospective billing The Maui scheduler has (too?) many knobs in this area 48% Equipment depreciation 15% Equipment maintenance 10% Machine electricity 8% Air conditioning (incl. depreciation) 17% 1 Programmer (1/3 of team of 3) 2% My time
  • 101. Lessons I don’t know I don’t know?
  • 102. Lessons I don’t know I don’t know? Any questions?