SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Recursive Computing
AMD on AMD…
Quentin Fennessy
Oct 2009 2
Who am I?
•Quentin Fennessy
– Worked at AMD (Advanced Micro Devices) for 10 years
– Compute Clusters: 10 years with clustered computing
– Unix: 20+ years in various industries (telecomm, automation,
semiconductors)
– BA in Computer Science from University of Massachusetts
– Manager for Core Services of Global Engineering IT
3
Recursive Computing--What?
•Definition: See RECURSIVE
Oct 2009
4
Goal of AMD Compute Clusters
•Develop, test, revise
and complete
microprocessor designs
•Do it efficiently
–time-wise
–$$-wise
–people-wise
•Support concurrent
design projects
–5 or 6 at any given time
Oct 2009
5
High Level Attainable Goals
•Plan to meet your business needs
•Understand the technical possibilities now
– work with your vendors
– hire and grow a great staff
•Understand the technical possibilities for the future
•Be flexible to accommodate changing business needs and
technical possibilities
Oct 2009
6
Compute Clusters at AMD
• Installed at each AMD design center
(Austin x 2, Fort Collins, Sunnyvale,
Boxborough, Dresden, Bangalore, )
• cluster size ranges from 200 to 10K+
cpus
• 98+% compute servers are AMD
Opteron™ and AMD Athlon™ MP
processor-based
• AMD Opteron and AMD Athlon MP
processor-based desktops are also
used as compute resources
• AMD processor-based systems run
64bit and 32bit Linux (Red Hat
Enterprise 3 and 4)
Oct 2009
7
History of AMD Clusters
c 1998: AMD K6 processors, Linux, ~400 systems
• c 2000: AMD Athlon™ processors, Linux, ~1K systems
• c 2001: More AMD Athlon processors, Linux, ~2K systems
• c 2002: More AMD Athlon processors, Linux, ~3K systems
•c 2003: AMD Opteron™ processors, Linux, ~4.5K
systems
•c 2004: More AMD Opteron processors, Linux, ~6K
systems
•c 2005: Dual Core AMD Opteron processors, Linux,
~7K systems, ~15K+ cpus
•c 2006: ~8K systems, ~23K+ cpus
Oct 2009
8
OS Transitions for AMD Clusters
•HP-UX → Solaris, painful as it was our first transition
•Solaris → HP-UX, painful because we forgot our first
•Solaris+HP-UX → 32 bit Linux, easier
•32bit Linux → 64bit Linux, easy! because of compatibility
•What makes an OS transition hard?
– implicit assumption that we will always use OS Foo-X
– the imagination and creativity of OS vendors
•What makes an OS transition easy?
– never assume anything will be the same next year
– avoid buying into OS-specific infrastructure tools
Oct 2009
9
HW Transitions for AMD Clusters
• HP → Sun, easy (Sun does a great job
maintaining systems)
• Sun → HP, easy (HP does a great job
maintaining systems)
• Sun, HP → AMD Athlon™ processor-based
systems (32bit), HARD (Linux device
issues, no system integration)
• AMD Athlon™ MP (32bit) → AMD Opteron™
processors, easy, it just worked
• Transition → Sun and HP AMD Opteron™
processor-based systems (easy, fast, very
nice systems)
Oct 2009
10
Historic Bottlenecks
•Every system, every cluster
has a bottleneck—the
slowest part of the system
•Goal—provide a balanced
cluster
•Bottleneck Candidates
–Fileservers
–Network
–Application licenses
–Cluster manager systems
Oct 2009
11
Data Storage
•2PB+ of network attached storage in 46 Netapp filers
•>50% are Opteron-based Netapp filers
•Typically Quad-GbE attached, with 10GbE testing in 1H07
•Fibre-channel and ATA disks, RAID-DP and RAID4 volumes
•Challenge 1: a few hundred jobs can overwhelm a
filer...either with raw I/O or relentless meta-data requests
•Challenge 2: moving data between filers is a division-visible
change and makes fileserver upgrades difficult
•Goal: a fileserver that can add cpu and network capacity as
easily as we add disk capacity
Oct 2009
12
Networking
•We use commodity networking from Nortel, Cisco (100baseT,
GbE)
•Post-2003 compute servers are connected via GbE switches
•Older systems are connected via 100baseT
•We use VLANs for partitioning, routing to connect to the rest
of AMD
•Our network provides redundant paths and management
components, except for the last mile to each compute server.
Oct 2009
13
Cluster Management via LSF
•Currently—excellent performance for job submission,
dispatch and status updates
•Our LSF job scheduler systems (for clusters with 10k cpus)
are available for under $25,000 from tier 1 vendors.
•We have a good upgrade path
•Challenge: Match Resource Allocation to Business Needs
Oct 2009
14
Best Practices
•Use revision control tools (RCS, Subversion, CVS, etc)
•Use OS-independent and vendor-independent tools
•Strive for uniformity in h/w and system s/w
•Reserve sample systems for testing and integration
•Plan for the failure of systems
•Use collaborative tools for communication, planning and
documentation (we use TWiki, irc, audio and video
conferencing)
Oct 2009
15
Our Fastest Systems are…
•AMD Opteron™ processor-based systems of course…
•Some optimizations:
•Fully populate memory DIMM slots for max bandwidth (typ 4
dimms/socket)
•Use ECC/Chipkill (x4) memory to correct up to 4bit errors
•Enable memory interleaving in the BIOS
•Use a 64bit NUMA-aware OS (Red Hat has done well for us)
•Recompile your applications in 64bit mode
Oct 2009
16
System Types in the Cluster
•AMD Opteron™ processor
–64bit, Linux, 2p→8p, 2GB→128GB
–Most with single ATA disk, some w/SCSI
–Most with single power supply
–Gigabit Ethernet, single connection
•AMD Athlon™ MP processor
–32bit, Linux, 1p→2p, 1GB→4GB
–ATA disk
–Single power supply
–100Mb Ethernet, single connection
• Other Unix Systems
– 64bit, 2p-8p, 2GB→28GB
Oct 2009
17
System Types in the Cluster
Cluster Capacity by Throughput
64%
35%
1%
0%
20%
40%
60%
80%
100%
AMD Opteron™
64bit
AMD Athlon™
32bit
Other
Oct 2009
18
Show Me Some Numbers
CPU and System Totals
0
2,000
4,000
6,000
8,000
10,000
ASDC ANDC SVDC BDC IDC Total
Total CPUs
Opteron System Total
Athlon System Total
Oct 2009
19
More Numbers
Total Capacity (Megabytes) per cluster
0
10
20
30
40
50
60
70
ASDC ANDC SVDC BDC IDC Total
Millions
Total RAM (MB)
Total Swap (MB)
Oct 2009
20
Internal Benchmark Comparison
K9mark for System Types
42
114 115
259
356
532
0
100
200
300
400
500
600
K
7
C
la
ssic
1
P
K
7
M
P
2
PK
7
B
a
rto
n
1
PO
p
tero
n
2
PO
p
tero
n
4
PO
p
tero
n
8
P
Processor type and qty
k9markscore
K9mark
Oct 2009
21
Large Cluster Throughput, for Texas2
(Year to Date)
Utilization 95%
LSF Jobs/Day 40K – 100K
Average Job turnaround 8-9 hours
Average CPU seconds/job 10,728
Oct 2009
22
Large Cluster Throughput, for Texas2
Max Job Throughput/hour
4250 (was 2500 last
year)
Jobs/day (peak) 120K+
Jobs/day (average) 50K
Oct 2009
23
Crunchy LSF Details
•Job Scheduler for Texas2 Cluster (3900 systems, 11k cpus)
– Hewlett Packard DL585
• 4 Single Core Opteron 854 (2.8Ghz)
• 16GB RAM
• 64bit Redhat Enterprise Linux 4, Update 1
•System Load for Job Scheduler
– Typically 40% busy
– 10.5MB/sec network traffic
– Manages 3900 compute nodes
• Queues jobs
• Monitors system load
• Monitors running jobs
Oct 2009
24
Job Types
•Architecture – what should it do?
•Functional Verification – will it work?
•Circuit Analysis – transistors, library characterization
•Implementation – put the pieces together
•Physical Verification – timing, capacitance
•Tapeout – send it to the fab
Oct 2009
25
Resource Usage by Job Types
Approximate Resource Usage
0%
20%
40%
60%
80%
100%
FunctionalVerification
Circuit
AnalysisArchitecture
PhysicalVerification
O
ther
Tapeout
Oct 2009
26
Architecture
•Highest level description of the cpu
–functional units (FP, Int, cache)
–bus connections (number, type)
–cache design (size, policy, coherence)
•Architectural Verification – up to multi-GB
processes
•Job pattern – 100s or 1000s of jobs run overnight
for experiments
•Fundamental early phase of each project
•Re-done during design to validate
Oct 2009
27
Functional Verification
•CPU-intensive, relatively low memory
•Huge quantities of similar jobs
•RTL 1-2GB processes
•Gates 2-8GB processes
Oct 2009
28
Circuit Analysis
•Many small jobs, some large jobs
•Peaky pattern of compute requirements
•Compute needs can multiply quickly when
manufacturing processes change
•Challenge: too-short jobs can be scheduled
inefficiently
Oct 2009
29
Physical Verification
•Physical Design & Routing
•Extraction of Electrical Characteristics including
Timing and Capacitance
•Memory intensive + compute intensive
Oct 2009
30
Tapeout – next stop, the FAB
•Compute intensive, one task may use >400 systems
•Memory intensive, approaching 128GB
•Longest-running jobs
– Fortunately clustered AMD Opteron™ processor-based systems
have reduced our longest job run-time to less than one week
•Last engineering step before manufacturing
– Time-to-market critical
Oct 2009
31
Challenges
•Growth
– Cluster size = X today, 2X in 18 months?
•Manageability
– Sysadmin/system ratio – can we stay the same or improve?
– Since 1999 the ratio has improved 3X
•Linux
– Improve quality
– Manage the rapid rate of change
•Scalability
– What decisions today will help us grow?
Oct 2009
32
Linux Challenges
•Linux Progression
– Redhat 6.x
– Redhat 7.x
– Suse Linux Enterprise Server 8.x
– Redhat Fedora Core 1
– Redhat Enterprise Linux 3.x
– Redhat Enterprise Linux 4.x
•Additional efforts include:
– Revision Control with CVS
– System installation with Kickstart
– Configuration Management with cfengine, yum
Oct 2009
33
Actual Train Wrecks
• Power Loss for one or multiple
buildings
– Breakers, City cable cuts, human
error
• Cooling loss
• Cooling loss + floods!
• NFS I/O overloads
• Network failures
– hardware
– human error
– software
• Job Scheduler Overload
– 100K pending jobs
– Relentless job status queries
Oct 2009
34
System Installation Progression
•1 Manual installation, no updates
•2 Automated installation, no updates
•3 Automated installation, manual updates
•4 Automated installation, automated updates
•We are currently at level 3, approaching level 4
– Kickstart for installation
– cfengine for all localization
– yum for package management
Oct 2009
35
Tools for Clusters
•We use LSF from Platform Computing on our clusters
•Locally written tools are easier sed than done
•Freely available software keeps everything working
•Perl, CVS, kickstart, cfengine, yum
•ethereal, tcpdump, ping, mtr, …
•ssh, rsync, clsh, syslog-ng, …
•Apache, TWiki
•Mysql
•RT
Oct 2009
36
Out of the Box?
• Can a Compute Cluster be an
“Out of the Box” experience?
– (will it just work?)
• Not for large clusters
• Why?
• These factors
– Applications
– Operating Systems
– System Hardware
– Network Hardware
– Network Configuration
– Physical Infrastructure (space,
power, cooling)
Oct 2009
37
Recursive Computing? What?
Our clusters are used to to design faster processors and better
systems for our customers – processors for your clusters and
our own.
•1999: AMD(AMD K6, HP-PA,SPARC) → AMD K7
•2000: AMD(AMD K6, SPARC) → AMD K7
•2001: AMD(AMD K7, AMD K6, SPARC) → AMD K7, K8
•2002: AMD(AMD K7, SPARC) → AMD K8
•2003: AMD(AMD K7, AMD K8) → AMD K8+++
•2004: AMD(AMD K7, AMD K8) → AMD K8+++
•2005: AMD(AMD K7, AMD K8(dual-core)) → AMD K8+++
Oct 2009
38
Trademark Attribution
AMD, the AMD Arrow Logo, AMD Athlon, AMD Opteron and
combinations thereof are trademarks of Advanced Micro
Devices, Inc. Other product names used in this presentation
are for identification purposes only and may be trademarks of
their respective companies.
Oct 2009

Contenu connexe

Tendances

Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overviewsagaroceanic11
 
Student guide power systems for aix - virtualization i implementing virtual...
Student guide   power systems for aix - virtualization i implementing virtual...Student guide   power systems for aix - virtualization i implementing virtual...
Student guide power systems for aix - virtualization i implementing virtual...solarisyougood
 
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...xKinAnx
 
Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2solarisyougood
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...xKinAnx
 
2123.a better waytoprint.universal print
2123.a better waytoprint.universal print2123.a better waytoprint.universal print
2123.a better waytoprint.universal printSumit Tambe
 
Xiv cloud machine_webinar_090414
Xiv cloud machine_webinar_090414Xiv cloud machine_webinar_090414
Xiv cloud machine_webinar_090414Jinesh Shah
 
Presentation power vm editions and power systems virtualization - basic
Presentation   power vm editions and power systems virtualization - basicPresentation   power vm editions and power systems virtualization - basic
Presentation power vm editions and power systems virtualization - basicsolarisyougood
 
What's new in informix v11.70
What's new in informix v11.70What's new in informix v11.70
What's new in informix v11.70am_prasanna
 
IV Evento GeneXus Italia - Storage IBM
IV Evento GeneXus Italia - Storage IBMIV Evento GeneXus Italia - Storage IBM
IV Evento GeneXus Italia - Storage IBMRad Solutions
 
Xldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocksXldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocksliqiang xu
 
DB2 Design for High Availability and Scalability
DB2 Design for High Availability and ScalabilityDB2 Design for High Availability and Scalability
DB2 Design for High Availability and ScalabilitySurekha Parekh
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718brettallison
 
W PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury IT
W PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury ITW PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury IT
W PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury ITPeter Ocasek
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015Doug O'Flaherty
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysisbrettallison
 
DB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for BeginnersDB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for BeginnersMartin Packer
 
Presentation power vm common 2012
Presentation   power vm common 2012Presentation   power vm common 2012
Presentation power vm common 2012solarisyougood
 
Simple Virtualization Overview
Simple Virtualization OverviewSimple Virtualization Overview
Simple Virtualization Overviewbassemir
 
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorSlide_N
 

Tendances (20)

Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overview
 
Student guide power systems for aix - virtualization i implementing virtual...
Student guide   power systems for aix - virtualization i implementing virtual...Student guide   power systems for aix - virtualization i implementing virtual...
Student guide power systems for aix - virtualization i implementing virtual...
 
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
 
Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
 
2123.a better waytoprint.universal print
2123.a better waytoprint.universal print2123.a better waytoprint.universal print
2123.a better waytoprint.universal print
 
Xiv cloud machine_webinar_090414
Xiv cloud machine_webinar_090414Xiv cloud machine_webinar_090414
Xiv cloud machine_webinar_090414
 
Presentation power vm editions and power systems virtualization - basic
Presentation   power vm editions and power systems virtualization - basicPresentation   power vm editions and power systems virtualization - basic
Presentation power vm editions and power systems virtualization - basic
 
What's new in informix v11.70
What's new in informix v11.70What's new in informix v11.70
What's new in informix v11.70
 
IV Evento GeneXus Italia - Storage IBM
IV Evento GeneXus Italia - Storage IBMIV Evento GeneXus Italia - Storage IBM
IV Evento GeneXus Italia - Storage IBM
 
Xldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocksXldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocks
 
DB2 Design for High Availability and Scalability
DB2 Design for High Availability and ScalabilityDB2 Design for High Availability and Scalability
DB2 Design for High Availability and Scalability
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718
 
W PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury IT
W PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury ITW PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury IT
W PROSTOCIE SIŁA - wirtualizacja sposobem na uproszczenie infrastruktury IT
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
DB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for BeginnersDB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for Beginners
 
Presentation power vm common 2012
Presentation   power vm common 2012Presentation   power vm common 2012
Presentation power vm common 2012
 
Simple Virtualization Overview
Simple Virtualization OverviewSimple Virtualization Overview
Simple Virtualization Overview
 
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
 

En vedette

как сохранить лояльность в период кризиса
как сохранить лояльность в период кризисакак сохранить лояльность в период кризиса
как сохранить лояльность в период кризисаАнна Полевичёк
 
день правовой помощи
день правовой помощидень правовой помощи
день правовой помощиElekxa
 
Создание успешного блога в инстаграме
Создание успешного блога в инстаграмеСоздание успешного блога в инстаграме
Создание успешного блога в инстаграмеJulia Yakovleva
 
Unsur instrinsik cerita fabel
Unsur instrinsik cerita fabelUnsur instrinsik cerita fabel
Unsur instrinsik cerita fabelAzan Asri
 
Electrification of railway, problems and types of solution
Electrification of railway, problems and types of solutionElectrification of railway, problems and types of solution
Electrification of railway, problems and types of solutionRITESH WANJARI
 
C6.mi.p3.s4. integración del informe final
C6.mi.p3.s4. integración del informe finalC6.mi.p3.s4. integración del informe final
C6.mi.p3.s4. integración del informe finalMartín Ramírez
 
Día da paz 2017 CPI As Mirandas
Día  da paz 2017 CPI As MirandasDía  da paz 2017 CPI As Mirandas
Día da paz 2017 CPI As MirandasYolanda Castro
 
Analítica web, google analytics, sem y seo
Analítica web, google analytics, sem y seoAnalítica web, google analytics, sem y seo
Analítica web, google analytics, sem y seoEnma Chancusi
 
Cómo elaborar un plan de negocios
Cómo elaborar un plan de negociosCómo elaborar un plan de negocios
Cómo elaborar un plan de negociosLima Innova
 
Permiso de residencia
Permiso de residenciaPermiso de residencia
Permiso de residenciaEmagister
 
Pasos a tener en cuenta con un joven inmigrante extremadura
Pasos a tener en cuenta con un joven inmigrante extremaduraPasos a tener en cuenta con un joven inmigrante extremadura
Pasos a tener en cuenta con un joven inmigrante extremaduraEmagister
 
Tesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libre
Tesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libreTesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libre
Tesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libreLeidy Reyes Rodriguez
 

En vedette (17)

как сохранить лояльность в период кризиса
как сохранить лояльность в период кризисакак сохранить лояльность в период кризиса
как сохранить лояльность в период кризиса
 
день правовой помощи
день правовой помощидень правовой помощи
день правовой помощи
 
2167
21672167
2167
 
Campfire News October 2015 Edition
Campfire News October 2015 Edition Campfire News October 2015 Edition
Campfire News October 2015 Edition
 
Создание успешного блога в инстаграме
Создание успешного блога в инстаграмеСоздание успешного блога в инстаграме
Создание успешного блога в инстаграме
 
Bia scognitivi ok
Bia scognitivi okBia scognitivi ok
Bia scognitivi ok
 
Unsur instrinsik cerita fabel
Unsur instrinsik cerita fabelUnsur instrinsik cerita fabel
Unsur instrinsik cerita fabel
 
Family
FamilyFamily
Family
 
Electrification of railway, problems and types of solution
Electrification of railway, problems and types of solutionElectrification of railway, problems and types of solution
Electrification of railway, problems and types of solution
 
C6.mi.p3.s4. integración del informe final
C6.mi.p3.s4. integración del informe finalC6.mi.p3.s4. integración del informe final
C6.mi.p3.s4. integración del informe final
 
Día da paz 2017 CPI As Mirandas
Día  da paz 2017 CPI As MirandasDía  da paz 2017 CPI As Mirandas
Día da paz 2017 CPI As Mirandas
 
Time temperature-transformation diagram
Time temperature-transformation diagramTime temperature-transformation diagram
Time temperature-transformation diagram
 
Analítica web, google analytics, sem y seo
Analítica web, google analytics, sem y seoAnalítica web, google analytics, sem y seo
Analítica web, google analytics, sem y seo
 
Cómo elaborar un plan de negocios
Cómo elaborar un plan de negociosCómo elaborar un plan de negocios
Cómo elaborar un plan de negocios
 
Permiso de residencia
Permiso de residenciaPermiso de residencia
Permiso de residencia
 
Pasos a tener en cuenta con un joven inmigrante extremadura
Pasos a tener en cuenta con un joven inmigrante extremaduraPasos a tener en cuenta con un joven inmigrante extremadura
Pasos a tener en cuenta con un joven inmigrante extremadura
 
Tesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libre
Tesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libreTesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libre
Tesis ucsm sistema_de_seguridad_en_redes_informaticas_basado_en_sw_libre
 

Similaire à Recursive Grid Computing AMD on AMD

Top 5 key capacity management concerns for Unix
Top 5 key capacity management concerns for UnixTop 5 key capacity management concerns for Unix
Top 5 key capacity management concerns for UnixMetron
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linuxBeGooden-IT Consulting
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM Research
 
Presentation best practices for optimal configuration of oracle databases o...
Presentation   best practices for optimal configuration of oracle databases o...Presentation   best practices for optimal configuration of oracle databases o...
Presentation best practices for optimal configuration of oracle databases o...xKinAnx
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeAnderson Bassani
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...Filipe Miranda
 
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. TanenbaumA Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaumeurobsdcon
 
System IBM x ivy bridge refresh
System IBM x ivy bridge refresh System IBM x ivy bridge refresh
System IBM x ivy bridge refresh Anh Tuan
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...Michael Gschwind
 
VLSI and ES Design -An Overview.pptx
VLSI and ES Design -An Overview.pptxVLSI and ES Design -An Overview.pptx
VLSI and ES Design -An Overview.pptxNukalaMurthy1
 
Engage 2019 - SUSE Linux and Container update
Engage 2019  - SUSE Linux and Container updateEngage 2019  - SUSE Linux and Container update
Engage 2019 - SUSE Linux and Container updateChristian Holsing
 
Computer Architechture and Organization
Computer Architechture and OrganizationComputer Architechture and Organization
Computer Architechture and OrganizationAiman Hafeez
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wicsashish61_scs
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptxdhivyak49
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDaehyeok Kim
 
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019Paula Koziol
 

Similaire à Recursive Grid Computing AMD on AMD (20)

Top 5 key capacity management concerns for Unix
Top 5 key capacity management concerns for UnixTop 5 key capacity management concerns for Unix
Top 5 key capacity management concerns for Unix
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linux
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
 
11136442.ppt
11136442.ppt11136442.ppt
11136442.ppt
 
Presentation best practices for optimal configuration of oracle databases o...
Presentation   best practices for optimal configuration of oracle databases o...Presentation   best practices for optimal configuration of oracle databases o...
Presentation best practices for optimal configuration of oracle databases o...
 
Lecture 31.pdf
Lecture 31.pdfLecture 31.pdf
Lecture 31.pdf
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. TanenbaumA Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
 
System IBM x ivy bridge refresh
System IBM x ivy bridge refresh System IBM x ivy bridge refresh
System IBM x ivy bridge refresh
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
 
VLSI and ES Design -An Overview.pptx
VLSI and ES Design -An Overview.pptxVLSI and ES Design -An Overview.pptx
VLSI and ES Design -An Overview.pptx
 
Engage 2019 - SUSE Linux and Container update
Engage 2019  - SUSE Linux and Container updateEngage 2019  - SUSE Linux and Container update
Engage 2019 - SUSE Linux and Container update
 
Computer Architechture and Organization
Computer Architechture and OrganizationComputer Architechture and Organization
Computer Architechture and Organization
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wics
 
Graham Chatfield16CV
Graham Chatfield16CVGraham Chatfield16CV
Graham Chatfield16CV
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019
 

Dernier

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Dernier (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

Recursive Grid Computing AMD on AMD

  • 1. Recursive Computing AMD on AMD… Quentin Fennessy
  • 2. Oct 2009 2 Who am I? •Quentin Fennessy – Worked at AMD (Advanced Micro Devices) for 10 years – Compute Clusters: 10 years with clustered computing – Unix: 20+ years in various industries (telecomm, automation, semiconductors) – BA in Computer Science from University of Massachusetts – Manager for Core Services of Global Engineering IT
  • 4. 4 Goal of AMD Compute Clusters •Develop, test, revise and complete microprocessor designs •Do it efficiently –time-wise –$$-wise –people-wise •Support concurrent design projects –5 or 6 at any given time Oct 2009
  • 5. 5 High Level Attainable Goals •Plan to meet your business needs •Understand the technical possibilities now – work with your vendors – hire and grow a great staff •Understand the technical possibilities for the future •Be flexible to accommodate changing business needs and technical possibilities Oct 2009
  • 6. 6 Compute Clusters at AMD • Installed at each AMD design center (Austin x 2, Fort Collins, Sunnyvale, Boxborough, Dresden, Bangalore, ) • cluster size ranges from 200 to 10K+ cpus • 98+% compute servers are AMD Opteron™ and AMD Athlon™ MP processor-based • AMD Opteron and AMD Athlon MP processor-based desktops are also used as compute resources • AMD processor-based systems run 64bit and 32bit Linux (Red Hat Enterprise 3 and 4) Oct 2009
  • 7. 7 History of AMD Clusters c 1998: AMD K6 processors, Linux, ~400 systems • c 2000: AMD Athlon™ processors, Linux, ~1K systems • c 2001: More AMD Athlon processors, Linux, ~2K systems • c 2002: More AMD Athlon processors, Linux, ~3K systems •c 2003: AMD Opteron™ processors, Linux, ~4.5K systems •c 2004: More AMD Opteron processors, Linux, ~6K systems •c 2005: Dual Core AMD Opteron processors, Linux, ~7K systems, ~15K+ cpus •c 2006: ~8K systems, ~23K+ cpus Oct 2009
  • 8. 8 OS Transitions for AMD Clusters •HP-UX → Solaris, painful as it was our first transition •Solaris → HP-UX, painful because we forgot our first •Solaris+HP-UX → 32 bit Linux, easier •32bit Linux → 64bit Linux, easy! because of compatibility •What makes an OS transition hard? – implicit assumption that we will always use OS Foo-X – the imagination and creativity of OS vendors •What makes an OS transition easy? – never assume anything will be the same next year – avoid buying into OS-specific infrastructure tools Oct 2009
  • 9. 9 HW Transitions for AMD Clusters • HP → Sun, easy (Sun does a great job maintaining systems) • Sun → HP, easy (HP does a great job maintaining systems) • Sun, HP → AMD Athlon™ processor-based systems (32bit), HARD (Linux device issues, no system integration) • AMD Athlon™ MP (32bit) → AMD Opteron™ processors, easy, it just worked • Transition → Sun and HP AMD Opteron™ processor-based systems (easy, fast, very nice systems) Oct 2009
  • 10. 10 Historic Bottlenecks •Every system, every cluster has a bottleneck—the slowest part of the system •Goal—provide a balanced cluster •Bottleneck Candidates –Fileservers –Network –Application licenses –Cluster manager systems Oct 2009
  • 11. 11 Data Storage •2PB+ of network attached storage in 46 Netapp filers •>50% are Opteron-based Netapp filers •Typically Quad-GbE attached, with 10GbE testing in 1H07 •Fibre-channel and ATA disks, RAID-DP and RAID4 volumes •Challenge 1: a few hundred jobs can overwhelm a filer...either with raw I/O or relentless meta-data requests •Challenge 2: moving data between filers is a division-visible change and makes fileserver upgrades difficult •Goal: a fileserver that can add cpu and network capacity as easily as we add disk capacity Oct 2009
  • 12. 12 Networking •We use commodity networking from Nortel, Cisco (100baseT, GbE) •Post-2003 compute servers are connected via GbE switches •Older systems are connected via 100baseT •We use VLANs for partitioning, routing to connect to the rest of AMD •Our network provides redundant paths and management components, except for the last mile to each compute server. Oct 2009
  • 13. 13 Cluster Management via LSF •Currently—excellent performance for job submission, dispatch and status updates •Our LSF job scheduler systems (for clusters with 10k cpus) are available for under $25,000 from tier 1 vendors. •We have a good upgrade path •Challenge: Match Resource Allocation to Business Needs Oct 2009
  • 14. 14 Best Practices •Use revision control tools (RCS, Subversion, CVS, etc) •Use OS-independent and vendor-independent tools •Strive for uniformity in h/w and system s/w •Reserve sample systems for testing and integration •Plan for the failure of systems •Use collaborative tools for communication, planning and documentation (we use TWiki, irc, audio and video conferencing) Oct 2009
  • 15. 15 Our Fastest Systems are… •AMD Opteron™ processor-based systems of course… •Some optimizations: •Fully populate memory DIMM slots for max bandwidth (typ 4 dimms/socket) •Use ECC/Chipkill (x4) memory to correct up to 4bit errors •Enable memory interleaving in the BIOS •Use a 64bit NUMA-aware OS (Red Hat has done well for us) •Recompile your applications in 64bit mode Oct 2009
  • 16. 16 System Types in the Cluster •AMD Opteron™ processor –64bit, Linux, 2p→8p, 2GB→128GB –Most with single ATA disk, some w/SCSI –Most with single power supply –Gigabit Ethernet, single connection •AMD Athlon™ MP processor –32bit, Linux, 1p→2p, 1GB→4GB –ATA disk –Single power supply –100Mb Ethernet, single connection • Other Unix Systems – 64bit, 2p-8p, 2GB→28GB Oct 2009
  • 17. 17 System Types in the Cluster Cluster Capacity by Throughput 64% 35% 1% 0% 20% 40% 60% 80% 100% AMD Opteron™ 64bit AMD Athlon™ 32bit Other Oct 2009
  • 18. 18 Show Me Some Numbers CPU and System Totals 0 2,000 4,000 6,000 8,000 10,000 ASDC ANDC SVDC BDC IDC Total Total CPUs Opteron System Total Athlon System Total Oct 2009
  • 19. 19 More Numbers Total Capacity (Megabytes) per cluster 0 10 20 30 40 50 60 70 ASDC ANDC SVDC BDC IDC Total Millions Total RAM (MB) Total Swap (MB) Oct 2009
  • 20. 20 Internal Benchmark Comparison K9mark for System Types 42 114 115 259 356 532 0 100 200 300 400 500 600 K 7 C la ssic 1 P K 7 M P 2 PK 7 B a rto n 1 PO p tero n 2 PO p tero n 4 PO p tero n 8 P Processor type and qty k9markscore K9mark Oct 2009
  • 21. 21 Large Cluster Throughput, for Texas2 (Year to Date) Utilization 95% LSF Jobs/Day 40K – 100K Average Job turnaround 8-9 hours Average CPU seconds/job 10,728 Oct 2009
  • 22. 22 Large Cluster Throughput, for Texas2 Max Job Throughput/hour 4250 (was 2500 last year) Jobs/day (peak) 120K+ Jobs/day (average) 50K Oct 2009
  • 23. 23 Crunchy LSF Details •Job Scheduler for Texas2 Cluster (3900 systems, 11k cpus) – Hewlett Packard DL585 • 4 Single Core Opteron 854 (2.8Ghz) • 16GB RAM • 64bit Redhat Enterprise Linux 4, Update 1 •System Load for Job Scheduler – Typically 40% busy – 10.5MB/sec network traffic – Manages 3900 compute nodes • Queues jobs • Monitors system load • Monitors running jobs Oct 2009
  • 24. 24 Job Types •Architecture – what should it do? •Functional Verification – will it work? •Circuit Analysis – transistors, library characterization •Implementation – put the pieces together •Physical Verification – timing, capacitance •Tapeout – send it to the fab Oct 2009
  • 25. 25 Resource Usage by Job Types Approximate Resource Usage 0% 20% 40% 60% 80% 100% FunctionalVerification Circuit AnalysisArchitecture PhysicalVerification O ther Tapeout Oct 2009
  • 26. 26 Architecture •Highest level description of the cpu –functional units (FP, Int, cache) –bus connections (number, type) –cache design (size, policy, coherence) •Architectural Verification – up to multi-GB processes •Job pattern – 100s or 1000s of jobs run overnight for experiments •Fundamental early phase of each project •Re-done during design to validate Oct 2009
  • 27. 27 Functional Verification •CPU-intensive, relatively low memory •Huge quantities of similar jobs •RTL 1-2GB processes •Gates 2-8GB processes Oct 2009
  • 28. 28 Circuit Analysis •Many small jobs, some large jobs •Peaky pattern of compute requirements •Compute needs can multiply quickly when manufacturing processes change •Challenge: too-short jobs can be scheduled inefficiently Oct 2009
  • 29. 29 Physical Verification •Physical Design & Routing •Extraction of Electrical Characteristics including Timing and Capacitance •Memory intensive + compute intensive Oct 2009
  • 30. 30 Tapeout – next stop, the FAB •Compute intensive, one task may use >400 systems •Memory intensive, approaching 128GB •Longest-running jobs – Fortunately clustered AMD Opteron™ processor-based systems have reduced our longest job run-time to less than one week •Last engineering step before manufacturing – Time-to-market critical Oct 2009
  • 31. 31 Challenges •Growth – Cluster size = X today, 2X in 18 months? •Manageability – Sysadmin/system ratio – can we stay the same or improve? – Since 1999 the ratio has improved 3X •Linux – Improve quality – Manage the rapid rate of change •Scalability – What decisions today will help us grow? Oct 2009
  • 32. 32 Linux Challenges •Linux Progression – Redhat 6.x – Redhat 7.x – Suse Linux Enterprise Server 8.x – Redhat Fedora Core 1 – Redhat Enterprise Linux 3.x – Redhat Enterprise Linux 4.x •Additional efforts include: – Revision Control with CVS – System installation with Kickstart – Configuration Management with cfengine, yum Oct 2009
  • 33. 33 Actual Train Wrecks • Power Loss for one or multiple buildings – Breakers, City cable cuts, human error • Cooling loss • Cooling loss + floods! • NFS I/O overloads • Network failures – hardware – human error – software • Job Scheduler Overload – 100K pending jobs – Relentless job status queries Oct 2009
  • 34. 34 System Installation Progression •1 Manual installation, no updates •2 Automated installation, no updates •3 Automated installation, manual updates •4 Automated installation, automated updates •We are currently at level 3, approaching level 4 – Kickstart for installation – cfengine for all localization – yum for package management Oct 2009
  • 35. 35 Tools for Clusters •We use LSF from Platform Computing on our clusters •Locally written tools are easier sed than done •Freely available software keeps everything working •Perl, CVS, kickstart, cfengine, yum •ethereal, tcpdump, ping, mtr, … •ssh, rsync, clsh, syslog-ng, … •Apache, TWiki •Mysql •RT Oct 2009
  • 36. 36 Out of the Box? • Can a Compute Cluster be an “Out of the Box” experience? – (will it just work?) • Not for large clusters • Why? • These factors – Applications – Operating Systems – System Hardware – Network Hardware – Network Configuration – Physical Infrastructure (space, power, cooling) Oct 2009
  • 37. 37 Recursive Computing? What? Our clusters are used to to design faster processors and better systems for our customers – processors for your clusters and our own. •1999: AMD(AMD K6, HP-PA,SPARC) → AMD K7 •2000: AMD(AMD K6, SPARC) → AMD K7 •2001: AMD(AMD K7, AMD K6, SPARC) → AMD K7, K8 •2002: AMD(AMD K7, SPARC) → AMD K8 •2003: AMD(AMD K7, AMD K8) → AMD K8+++ •2004: AMD(AMD K7, AMD K8) → AMD K8+++ •2005: AMD(AMD K7, AMD K8(dual-core)) → AMD K8+++ Oct 2009
  • 38. 38 Trademark Attribution AMD, the AMD Arrow Logo, AMD Athlon, AMD Opteron and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this presentation are for identification purposes only and may be trademarks of their respective companies. Oct 2009