1. Attie Juyn & Wilhelm van Belkum igh H erformance P omputing C & Computing GRID
2.
3.
4. To establish an Institutional HPC Level 1 : (Entry Level) Personal workstation Level 2 : Departmental Compute Cluster Level 3 : Institutional HPC Level 4 Nat./Int. HPC
5.
6.
7.
8.
9.
10.
11.
12. The New World Order Source 2006 UC Regents Mainframe Mini Computer PC Cluster & Grids Vector Supercomputer
13. Technical goals Build a Institutional H igh P erformance C omputing facility, based on Beowulf cluster principals, coexisting and linking existing departmental cluster, the National and International computational Grids
14.
15.
16. The Evolved Cluster Compute Nodes Admin User Job Queue Source Cluster Resources, Inc. Resource Manager Scheduler License Manager Myrinet Identity Manager Allocation Manager Resource Manager Scheduler Departmental Cluster
18. Grid/Cluster Stack or Framework EGEE Chinese USA EU MPI PVM LAM MPICH Parallel Serial Application Resource Manager Rocks Oscar MPI PVM LAM MPICH Parallel Serial Application Resource Manager Oscar Torque Rocks Hardware (Cluster or SMP) CentOS Solaris RedHat UNICOS AIX Scientific Linux Windows Mac OS X HP UX Other Operating System Security GLOBUS CROWNGrid gLite UNICORE Grid Workload Manager: Scheduler, Policy Manager, Integration Platform Load Leveler PBSpro PBS SGE Condor(G) LSF SLURM Cluster Workload Manager: Scheduler, Policy Manager, Integration Platform Nimrod MOAB MAUI Portal CLI GUI Application Users Admin
21. The #1 and #13 in world (2007) BlueGene/L - eServer Blue Gene Solution (IBM/212992 Power cores) DOE/NNSA/LLNL - USA MareNostrum - BladeCenter JS21 Cluster, PPC 970, 2.3 GHz, Myrinet (IBM 10240 Power cores) Barcelona Supercomputer Centre – Spain (63.83 teraFLOP) 478.2 trillion floating operations per second (teraFLOPS) on LINPACK The #4 and #40 in world (2008)
22. As of November 2008 #1 : Roadrunner Roadrunner - BladeCenter QS22/LS21 Cluster, 12,240 x PowerXCell 8i 3.2 Ghz 6,562 Dual-Core Opteron 1.8 GHz DOE/NNSA/LANL - United States 1.105 PetaFlop
25. Introducing - Utility Computing Swap & migrating of Hardware (First Phase) Dynamic load shifting on RM level (Second Phase) Grid Workload Manager Condor, MOAB Utility Computing Data Center RM HPC RM
26. Grid/Cluster Stack or Framework EGEE Chinese USA EU Hardware (Cluster or SMP) MPI PVM LAM MPICH Parallel Serial Application Resource Manager Rocks Oscar MPI PVM LAM MPICH Parallel Serial Application Resource Manager Oscar Torque Rocks CentOS Solaris RedHat UNICOS AIX Scientific Linux Windows Mac OS X HP UX Other Operating System GLOBUS CROWNGrid gLite UNICORE Grid Workload Manager: Scheduler, Policy Manager, Integration Platform Load Leveler PBSpro PBS SGE Condor(G) LSF SLURM Cluster Workload Manager: Scheduler, Policy Manager, Integration Platform Nimrod MOAB MAUI Security Portal CLI GUI Application Users Admin
27. HP BL460c 8*3GHz Xeon 12G L2, 1333Mhz FSB 10G memory (96GFlop) HP Modular Cooling System G2 Up to 4 HP C7000, 512 CPU cores 5.12 TFlop HP Blc Virtual Connect Ethernet D-Link X-stack DSN3200 10.5TB RAID5, 80 000 I/O per second HP C7000 Up to 16 HP2x220c (3.072TFlop) 1024 CPU cores HP2x220c (12.288TFlop) BL2x220c 16*3GHz Xeon 192GFlop HP C7000 Up to 16 HP460c (1.536TFlop)
28.
29.
30.
31. HP ProLiant BL2x220c G5 Internal View Two Mezzanine Slots Two x8 (both reside on bottom board) 2 x Optional SATA HDDs Top and bottom PCA, side by side 2 x 2 CPUs 2 x 4 DIMM Slots DDR2 533/667MHz 2 x Embedded 1Gb Ethernet Dual-Port NICs Server Board Connectors
38. SEACOM TE-North is a new cable currently being laid across the Mediterranean Sea Cable Laying to start Oct. 08 Final splicing April 09 Service launch June 09
41. igh H erformance P omputing C & GRID Computing orth U U est U U niversity Sustainable Efficient Reliable High Availability & Performance @ >3TFlop Scientific Linux
42.
Notes de l'éditeur
In summary we determined that the following would need to be address in any HPC to be successful.
In the beginning their where only one big shark (The Mainframe) The next era of supper computing came with the introduction vector Supercomputer the likes of Cray etc.. The next step was compacting into Mini Computer Al the previous approaches where based on SMP closely coupled in one box And then came the modes Personal Computer. Not very strong on it’s own, but connecting a lot of them together make one Big fish So we suited-up got our best fishing rods and decides to go fishing... For one of these new Big Fish Becoming part of the New World Order.
At the previous HPC conference we came, we saw, we determined that, as Institutional IT, the time was right. The University wanted to become a major player in the New World order. This would not be the first try at this, in 1991 we implemented the SP, but the time was not right (Previous part of the presentation) In the mean time the University also ventured into clustering with three Departmental clusters (FSK, Chemistry, BWI) So what do we want to do Technically that would be different. We what to implement the H in HPC > 1 TFlop configuration. Beowulf approach -> open source , commodity of the shelf hardware,
So what is a Beowulf Cluster ?
How did the first Beowulf cluster look like Note the amount of time it took the assemble the cluster 8 months, taking into account Moore’s law this would makeable influenced the effective production life of the cluster.
The light dotted lines show the originator of software. The issue for us is choice of Cluster software as to allow integration into grids The major issues is on scheduler level and making the HPC appear as a CE in the grid.
Concept framework source Cluster Resources, Inc. Show what did we decide, representing the previous slides in a layer approach simmalar to ISO layers We started with hardware. -> ? HW, OS Resource Manager, Cluster schedulers and finally the Grid workload manager.
Based the Barcelona picture we did put I a requisition for a new building to house the new NWU HPC.. But we are still waiting… OK the real reason is. Reason for showing #13. When slides where setup Barcelona was #5 dropped down in #13 in less than 6 months We needs to build have a strategy that is sustainable with fast upgraded path.
We started looking around to determine what is the major issues that HPC have and found that Reliability and Availability is a major factor.
In summary we determined that the following would need to be address in any HPC to be successful.
The first strategy that we will used to extend the capacity and lifecycle of HPC technology will be to: Utilize the characteristics of Data center vs. that of HPC Implement new high performance CPU in HPC and migrate technology to data center As first phase to manual hardware load management through swapping of blades between HPC and data center to match peak demands extend concept to later do this dynamically on Resource Manger level in the long run (also referred to as utility computing) We needed to a strategy to make the HPC cost effective
So looking at the technologies that we were already using in the data center Why start here ? Cost effectiveness, training people on new technology that is only used in HPC would reduce cost effectiveness. Take note Modular fast extension with less work
HP Confidential – may only be shown to customers under NDA and may not be left behind with a customer under any circumstance. [Enter any extra notes here; leave the item ID line at the bottom] Avitage Item ID: {{C97F4853-0C33-430E-AE0B-9F33E6E58879}}
HP Confidential – may only be shown to customers under NDA and may not be left behind with a customer under any circumstance. [Enter any extra notes here; leave the item ID line at the bottom] Avitage Item ID: {{A59D56E3-21C4-498D-B6C4-605439F2D290}}
Show how does the NWU HPC configuration look like.
What is the Spec’s 256 Cores
Addressing the Reliability and Availability
Institutional facility how do we like this. The limitation still is speed. Brining on SANREN.
Monday, 31 March 2008 : The four sites are the main campuses of Wits, UJ, and two of UJ’s satellite campuses, Bunting and Doornfontein says Christiaan Kuun , SANReN Project Manager at the Meraka Institute
How will SANREN be used for the National GRID But what about International Grid. -> SEACOM
SEACOM PROJECT UPDATE - 14 Aug 2008 Construction on-schedule with major ground and sea-based activities proceeding over the next eight months 14 August 2008 – The construction of SEACOM’s 15,000 km fibre optic undersea cable, linking southern and east Africa, Europe and south Asia, is on schedule and set to go live as planned in June 2009 . Some 10,000 km of cable has been manufactured to date at locations in the USA and Japan and Tyco Telecommunications (US) Inc., the project contractors, will begin shipping terrestrial equipment this month with the cable expected to be loaded on the first ship in September 2008. Laying of shore end cables for each landing stations will also proceed from September. This process will comprise the cable portions at shallow depths ranging from 15 to 50m where large vessels are not able to operate. From October 2008, the first of three Reliance Class vessels will start laying the actual cable. The final splicing, which involves connecting all cable sections together, will happen in April 2009, allowing enough time for testing of the system before the commercial launch in June 2009. The final steps of the Environmental Social Impact Assessment (ESIA) process are well advanced and all small archeological, marine and ecological studies, which required scuba diving analysis, have been completed, as well as social consultations with the affected parties. The cable, including repeaters necessary to amplify the signal, will be stored in large tanks onboard the ships. The branching units necessary to divert the cable to the planned landing stations will be connected into the cable path on the ship just prior to deployment into the sea. The cable will then be buried under the ocean bed with the help of a plow along the best possible route demarcated through the marine survey. The connectivity from Egypt to Marseille, France, will be provided through Telecom Egypt’s TE-North fibre pairs that SEACOM has purchased on the system. TE-North is a new cable currently being laid across the Mediterranean Sea. Brian Herlihy, SEACOM President, said: “ We are very happy with the progress made over the past five months. Our manufacturing and deployment schedule is on target and we are confident that we will meet our delivery promises in what is today an incredibly tight market underpinned by sky-rocketing demand for new cables resulting in worldwide delivery delays. “The recently announced executive appointments combined with the project management capabilities already existent within SEACOM position us as a fully fledged telecoms player. We are able to meet the African market’s urgent requirements for cheap and readily available bandwidth within less than a year. ” The cable will go into service long before the 2010 FIFA World Cup kicks-off in South Africa and SEACOM has already been working with key broadcasters to meet their broadband requirements. The team is also trying to expedite the construction in an attempt to assist with the broadcasting requirements of the FIFA Confederations Cup scheduled for June 2009. SEACOM, which is privately funded and over three quarter African owned, will assist communication carriers in south and east Africa through the sale of wholesale international capacity to global networks via India and Europe. The undersea fibre optic cable system will provide African retail carriers with equal and open access to inexpensive bandwidth, removing the international infrastructure bottleneck and supporting east and southern African economic growth. SEACOM will be the first cable to provide broadband to countries in east Africa which, at the moment, rely entirely on expensive satellite connections.
The result of SEACOM and SANREN…
The Timeline vision in terms of production quality National & International GRID