3. Outline
• The Tutorials
• Plenary Talks
• Papers & Panels
• The Top500 list
• The Exhibition
http://sc10.supercomputing.org/
4. Day 0 - Arrival
US Airways entertainment system is
running Linux!
5.
6.
7. A lecture by Prof. Rubin Landau
Computational Physics at the
Educational Track
3:30PM - 5:00PM Communities, Education Physics: Examples in Computational
Physics, Part 2 Physics: Examples in Computational Physics, Part 2 Rubin Landau 297
Although physics faculty are incorporating computers to enhance physics education,
computation is often viewed as a black box whose inner workings need not be
understood. We propose to open up the computational black box by providing
Computational Physics (CP) curricula materials based on a problem-solving paradigm that
can be incorporated into existing physics classes, or used in stand-alone CP classes. The
curricula materials assume a computational science point of view, where understanding of
the applied math and the CS is also important, and usually involve a compiled language in
order for the students to get closer to the algorithms. The materials derive from a new CP
eTextbook available from Compadre that includes video-based lectures, programs, applets,
visualizations and animations.
8. Eclipse PTP
At last FORTRAN has an
advanced, free, IDE !!!
PTP - Parallel Tools Platform
http://www.eclipse.org/ptp/
15. Amazon Cluster GPU Instances provide 22 GB of memory, 33.5 EC2 Compute
Units, and utilize the Amazon EC2 Cluster network, which provides high throughput
and low latency for High Performance Computing (HPC) and data intensive
applications. Each GPU instance features two NVIDIA Tesla® M2050 GPUs,
delivering peak performance of more than one trillion double-precision FLOPS.
Many workloads can be greatly accelerated by taking advantage of the parallel
processing power of hundreds of cores in the new GPU instances. Many industries
including oil and gas exploration, graphics rendering and engineering design are
using GPU processors to improve the performance of their critical applications.
Amazon Cluster GPU Instances extend the options for running HPC workloads in
the AWS cloud. Cluster Compute Instances, launched earlier this year, provide the
ability to create clusters of instances connected by a low latency, high throughput
network. Cluster GPU Instances give customers with HPC workloads an additional
option to further customize their high performance clusters in the cloud. For those
customers who have applications that can benefit from the parallel computing
power of GPUs, Amazon Cluster GPU Instances can often lead to even further
efficiency gains over what can be achieved with traditional processors. By
leveraging both instance types, HPC customers can tailor their compute cluster to
best meet the performance needs of their workloads. For more information on HPC
capabilities provided by Amazon EC2, visit aws.amazon.com/ec2/hpc-applications.
Amazon Cluster GPU Instances
22. World’s #1
China's National University of Defense Technology's Tianhe-1A
supercomputer has taken the top ranking from Oak Ridge National
Laboratory's Jaguar supercomputer on the latest Top500 ranking of
the world's fastest supercomputers. The Tianhe-1A achieved a
performance level of 2.67 petaflops per second, while Jaguar
achieved 1.75 petaflops per second. The Nebulae, another Chinese-
built supercomputer, came in third with a performance of 1.27
petaflops per second. "What the Chinese have done is they're
exploiting the power of [graphics processing units], which
are...awfully close to being uniquely suited to this particular
benchmark," says University of Illinois Urbana-Champagne
professor Bill Gropp. Tianhe-1A is a Linux computer built from
components from Intel and NVIDIA. "What we should be focusing on
is not losing our leadership and being able to apply computing to a
broad range of science and engineering problems," Gropp says.
Overall, China had five supercomputers ranked in the top 100, while
42 of the top 100 computers were U.S. systems.
29. Disruption is the mechanism by which great companies continue to succeed and new
entrants displace the market leaders. Disruptive innovations either create new markets or
reshape existing markets by delivering relatively simple, convenient, low cost innovations to
a set of customers who are ignored by industry leaders. One of the bedrock principles of
Christensen's disruptive innovation theory is that companies innovate faster than
customers' lives change. Because of this, most organizations end up producing products
that are too good, too expensive, and too inconvenient for many customers. By only
pursuing these "sustaining" innovations, companies unwittingly open the door to "disruptive"
innovations, be it "low-end disruption" targeting overshot-less-demanding customers or
"new-market disruption", targeting non-consumers. 1. Many of today’s markets that appear
to have little growth remaining, actually have great growth potential through disruptive
innovations that transform complicated, expensive products into simple, affordable ones. 2.
Successful innovation seems unpredictable because innovators rely excessively on data,
which is only available about the past. They have not been equipped with sound theories
that do not allow them to see the future perceptively. This problem has been solved. 3.
Understanding the customer is the wrong unit of analysis for successful innovation.
Understanding the job that the customer is trying to do is the key. 4. Many innovations
that have extraordinary growth potential fail, not because of the product or service itself, but
because the company forced it into an inappropriate business model instead of creating a
new optimal one. 5. Companies with disruptive products and business models are the ones
whose share prices increase faster than the market over sustained periods
How to Create New Growth in a Risk-
Minimizing Environment
31. High-End Computing and Climate
Modeling: Future Trends and Prospects
SESSION: Big Science, Big Data II
Presenter(s):Phillip Colella
ABSTRACT:
Over the past few years, there has been considerable discussion of the change in
high-end computing, due to the change in the way increased processor performance
will be obtained: heterogeneous processors with more cores per chip, deeper and
more complex memory and communications hierarchies, and fewer bytes per flop.
At the same time, the aggregate floating-point performance at the high end will
continue to increase, to the point that we can expect exascale machines by the
end of the decade. In this talk, we will discuss some of the consequences of these
trends for scientific applications from a mathematical algorithm and software
standpoint. We will use the specific example of climate modeling as a focus, based
on discussions that have been going on in that community for the past two years.
Chair/Presenter Details:
Patricia Kovatch (Chair) - University of Tennessee, Knoxville
Phillip Colella - Lawrence Berkeley National Laboratory
32. Prediction of Earthquake Ground Motions
Using Large-Scale Numerical Simulations
SESSION: Big Science, Big Data II
Presenter(s):Tom Jordan
ABSTRACT:
Realistic earthquake simulations can now predict strong ground motions from
the largest anticipated fault ruptures. Olsen et al. (this meeting) have simulated
a M8 “wall-to-wall” earthquake on southern San Andreas fault up to 2-Hz,
sustaining 220 teraflops for 24 hours on 223K cores of NCCS Jaguar. Large
simulation ensembles (~10^6) have been combined with probabilistic rupture
forecasts to create CyberShake, a physics-based hazard model for Southern
California. In the highly-populated sedimentary basins, CyberShake predicts
long-period shaking intensities substantially higher than empirical models,
primarily due to the strong coupling between rupture directivity and basin
excitation. Simulations are improving operational earthquake forecasting, which
provides short-term earthquake probabilities using seismic triggering models,
and earthquake early warning, which attempts to predict imminent shaking
during an event. These applications offer new and urgent computational
challenges, including requirements for robust, on-demand supercomputing
and rapid access to very large data sets.
34. Exascale Computing Will (Won't) Be Used
by Scientists by the End of This Decade
EVENT TYPE: Panel
Panelists:Marc Snir, William Gropp, Peter Kogge, Burton Smith, Horst Simon, Bob
Lucas, Allan Snavely, Steve Wallach
ABSTRACT:
DOE has set a goal of Exascale performance by 2018. While not impossible, this
will require radical innovations. A contrarian view may hold that technical obstacles,
cost, limited need, and inadequate policies will delay exascale well beyond 2018.
The magnitude of the required investments will lead to a public discussion for which
we need to be well prepared. We propose to have a public debate on the
proposition "Exascale computing will be used by the end of the decade", with one
team arguing in favor and another team arguing against. The arguments should
consider technical and non-technical obstacles and use cases. The proposed
format is: (a) introductory statements by each team (b) Q&A's where each team can
put questions to other team (c) Q&A's from the public to either teams. We shall
push to have a lively debate that is not only informative, but also entertaining.
35. GPU Computing: To ExaScale and
Beyond
Bill Dally - NVIDIA/Stanford University
36. Dedicated High-End Computing to Revolutionize
Climate Modeling: An International Collaboration
ABSTRACT:
A collaboration of six institutions on three continents is investigating the use of
dedicated HPC resources for global climate modeling. Two types of
experiments were run using the entire 18,048-core Cray XT-4 at NICS from
October 2009 to March 2010: (1) an experimental version of the ECMWF
Integrated Forecast System, run at several resolutions down to 10 km grid
spacing to evaluate high-impact and extreme events; and (2) the NICAM global
atmospheric model from JAMSTEC, run at 7 km grid resolution to simulate the
boreal summer climate, over many years. The numerical experiments sought to
determine whether increasing weather and climate model resolution to
accurately resolve mesoscale phenomena in the atmosphere can improve the
model fidelity in simulating the mean climate and the distribution of variances
and covariances.
Chair/Presenter Details:
Robert Jacob (Chair) - Argonne National Laboratory
James Kinter - Institute of Global Environment and Society
37. Using GPUs for Weather and Climate Models
Presenter(s):Mark Govett
ABSTRACT:
With the power, cooling, space, and performance restrictions facing large CPU-
based systems, graphics processing units (GPUs) appear poised to become
the next-generation super-computers. GPU-based systems already are two of
the top ten fastest supercomputers on the Top500 list, with the potential to
dominate this list in the future. While the hardware is highly scalable, achieving
good parallel performance can be challenging. Language translation, code
conversion and adaption, and performance optimization will be required. This
presentation will survey existing efforts to use GPUs for weather and climate
applications. Two general parallelization approaches will be discussed. The
most common approach is to run select routines on the GPU but requires
data transfers between CPU and GPU. Another approach is to run
everything on the GPU and avoid the data transfers, but this can require
significant effort to parallelize and optimize the code.
38. Global Arrays
Global Arrays Roadmap and Future Developments
SESSION: Global Arrays: Past, Present & Future
EVENT TYPE: Special and Invited Events
SESSION CHAIR: Moe Khaleel
Speaker(s):Daniel Chavarria
ABSTRACT:
This talk will describe the current state of the Global Arrays toolkit and its
underlying ARMCI communication layer and how we believe they should
evolve over the next few years. The research and development agenda is
targeting expected architectural features and configurations on emerging
extreme-scale and exascale systems.
Speaker Details:
Moe Khaleel (Chair) - Pacific Northwest National Laboratory
Daniel Chavarria - Pacific Northwest National Laboratory
40. Enabling High Performance Cloud
Computing Environments
SESSION LEADER(S):Jurrie Van Den Breekel
ABSTRACT:
The cloud is the new “killer” service to bring service providers and enterprises into the
age of network services capable of infinite scale. As an example, 5,000 servers with
many cloud services could feasibly serve one billion users or end devices. The idea of
services at this scale is now possible with multi-core processing, virtualization and high
speed Ethernet, but even today the mix of implementing these technologies requires
careful considerations in public and private infrastructure design. While cloud computing
offers tremendous possibilities, it is critical to understanding the limitations of this
framework across key network attributes such as performance, security, availability and
scalability. Real-world testing of a cloud computing environment is a key step toward
putting any concerns to rest around performance, security and availability. Spirent will
share key findings that are the result of some recent work with the European Advanced
Networking Test Center (EANTC) including a close examination of how implementing a
cloud approach within a private or private data center affects the firewall, data center
bridging, virtualization, and WAN optimization.
Session Leader Details:
Jurrie Van Den Breekel (Primary Session Leader) - Spirent Communications
41. Cont’
Speakers:
NEOVISE – Paul Burns
SPIRENT – Jurrie van den Breekel
BROCADE – Steve Smith
Paul:
Single application – single server
Single application – multiple servers (cluster computing)
Multiple applications – single sever – virtualization
Multiple applications – multiple servers – Cloud computing
3rd dimension : tenants, T1 T2 on the same physical server - security
43. Future Supercomputing Centers
Thom Dunning, William Gropp, Thomas Lippert, Satoshi Matsuoka, Thomas
Zacharia
This panel will discuss the nature of federal- and state-supported
supercomputing centers, what is required to sustain them in the future, and
how they will cope with the evolution of computing technology. Since the
federally supported centers were created in the mid-1980s, they have fueled
innovation and discovery, increasing the number of computational
researchers, stimulating the use of HPC in industry, and pioneering new
technologies. The future of supercomputing is exciting—sustained petascale
systems are here with planning for exascale systems now underway—but it
also challenging— disruptive technology changes will be needed to
reach the exascale. How can supercomputing help ensure that today’s
petascale supercomputer are effectively used to advance science and
engineering and how can they help the research and industrial communities
prepare for an exciting, if uncertain future?
44. Advanced HPC Execution Models:
Innovation or Disruption
Panelists:Thomas L. Sterling, William Carlson, Guang Gao, William Gropp, Vivek
Sarkar, Thomas Sterling, Kathy Yelick
ABSTRACT:
An execution model is the underlying conceptual foundation that integrates the
HPC system architecture, programming methods, and intervening Operating
System and runtime system software. It is a set of governing principles that govern
the co-design, operation, and interoperability of the system layers to achieve most
efficient scalable computing in terms of time and energy. Historically, HPC has been
driven by five previous epochs of execution models including the most recent CSP
that has been exemplified by "Pax MPI" for almost two decades. HPC is now
confronted by a severe barrier of parallelism, power, clock rate, and complexity
exemplified by multicore and GPU heterogeneity impeding progress between
today's Petascale and the end of the decade's Exascale performance. The panel
will address the key questions of requirements, form, impact, and programming of
such future execution models should they emerge from research in academia,
industry, and government centers.
China’s #1 worlds leader, Director of the project.
France - #1 in Europe, Bull, Terra 100 with Mellanox Technology and Voltaire’s swithes
Students competition – a super computer for 26 watts – 3 coffee makers – 1+ TFlops
Differences between Ethernet and IB
Tsubame 2.0 – Windows, Voltaire
TOP500 Highlights - November 2010
TOP500 Highlights
· The Chinese Tianhe-1A system is the new No. 1 on the TOP500 and clearly in the lead with 2.57 petaflop/s performance.
· No. 3 is also a Chinese system called Nebulae, built from a Dawning TC3600 Blade system with Intel X5650 processors and NVIDIA Tesla C2050 GPUs
· There are seven petaflop/s systems in the TOP10
· The U.S. is tops in petaflop/s with three systems performing at the petaflop/s level
· The two Chinese systems and the new Japanese Tsubame 2.0 system at No. 4 are all using NVIDIA GPUs to accelerate computation and a total of 28 systems on the list are using GPU technology.
· China keeps increasing its number of systems to 41 and is now clearly the No. 2 country, as a user of HPC, ahead of Japan, France, Germany, and UK.
· The Jaguar system at Oak Ridge National Laboratory slipped to the No. 2 spot with 1.75 Pflop/s Linpack performance.
· The most powerful system in Europe is a Bull system at the French CEA at No. 6.
· Intel dominates the high-end processor market, with 79.6 percent of all systems and over 90 percent of quad-core based systems.
· Intel’s Westmere processors increased their presence in the list with 56 systems, compared with seven in the last list.
· Quad-core processors are used in 73 percent of the systems, while 19 percent of the systems use processors with six or more cores.
· Other notable systems are:
- The Grape custom accelerator-based systems in Japan at No. 280 and No. 384
- The #4 system Tsubame 2.0 that can run a Windows OS and achieve almost identical performance doing so.
· Cray regained the No. 2 spot in market share by total performance from Hewlett-Packard, but IBM stays well ahead.
· The Cray’s XT system series remains very popular for big research customers with four systems in the TOP10 (two new and two previously listed).
Power consumption of supercomputers
· TOP500 now tracks actual power consumption of supercomputers in a consistent fashion.
· Only 25 systems on the list are confirmed to use more than 1 megawatt (MW) of power.
· The No. 2 system Jaguar reports the highest total power consumption of 6.95 MW.
· Average power consumption of a TOP500 system is 447 KW (up from 397 KW six months ago) and average power efficiency is 219 Mflops/watt (up from 195 Mflops/watt six month ago).
· Average power consumption of a TOP10 system is 3.2 MW (up from 2.89 MW six months ago) and average power efficiency is 268 Mflops/watt down from 300 Mflops/watt six months ago.
· Most energy efficient supercomputers are based on:
- BlueGene/Q Prototype with 1680 Mflop/watt
- Fujitsu K-Computer at Riken with 829 Mflop/watt
- QPace Clusters based on IBM PowerXCell 8i processor blades in Germany (up to 774 Mflop/watt)
Highlights from the Top 10:
· The new Chinese Tianhe-1A system is the new No. 1 on the TOP500 and clearly in the lead with 2.57 petaflop/s performance.
· The TOP10 features five new systems, four of which show more than one petaflop/s Linpack performance which bring the total number of petaflops systems up to seven.
· The Chinese Nebulae system, which had its debut in the TOP500 only six months ago is at No. 3 and the second Chinese system in the TOP10.
· Tsubame 2.0 is new, coming in at No. 4.
· At No. 5 is a new Cray XE6 system installed at the National Energy Research Scientific Computing Center (NERSC) at the Lawrence Berkeley National Laboratory (LBNL). This is the third U.S. system ever to break the petaflop/s barrier after RoadRunner (No.7) and Jaguar (No. 2).
· New to the list are Hopper (No. 5) and Cielo (No. 10), both Cray machines.
· The other new system is at CEA in France (No. 6).
· The U.S. only has five systems in the TOP10, Nos. 2, 5, 7, , and 10. The others are in China, Japan, France and Germany.
General highlights from the TOP500 since the last edition:
· Already 95 systems are using processors with 6 or more cores. Quad-core processor-based systems still dominate the TOP500, as 365 systems are using them and 37 systems are still using dual-core processors.
· The entry level to the list moved up to the 31.1 Tflop/s mark on the Linpack benchmark, compared to 24.7 Tflop/s six months ago.
· The last system on the newest list was listed at position 305 in the previous TOP500 just six months ago. This turnover rate is about average after the rather low replacement rate six months ago.
· Total combined performance of all 500 systems has grown to 44.2 Pflop/s, compared to 32.4 Pflop/s six months ago and 27.6 Pflop/s one year ago.
· The entry point for the TOP100 increased in six months from 52.84 Tflop/s to 75.76 Tflop/s.
· The average concurrency level in the TOP500 is 13,071 cores per system, up from 10,267 six months ago and 9,174 one year ago.
· A total of 398 systems (79.6 percent) are now using Intel processors. This is slightly down from six months ago (406 systems, 81.2 percent). Intel continues to provide the processors for the largest share of TOP500 systems.
· They are now followed by the AMD Opteron family with 57 systems (11.4 percent), up from 47.
· The share of IBM Power processors is slowly declining, now accounting for 40 systems (8.0 percent), down from 42.
· 17 systems use GPUs as accelerators, 6 of these use Cell processors, ten use NVIDIA chips and one uses ATI Radeon.
· Gigabit Ethernet is still the most-used internal system interconnect technology (227 systems, down from 244 systems), due to its widespread use at industrial customers, followed by InfiniBand technology with 214 systems, up from 205 systems.
· However, InfiniBand-based systems account for two and a half times as much performance (20.4 Pflop/s) than Gigabit Ethernet ones (8.7 Pflop/s).
· IBM and Hewlett-Packard continue to sell the bulk of the systems at all performance levels of the TOP500.
· IBM kept its lead in systems and has now 200 systems (40 percent) compared to HP with 158 systems (31.6 percent). HP had 185 systems (37 percent) six months ago, compared to IBM with 198 systems (39.8 percent).
· IBM remains the clear leader in the TOP500 list in performance with 27.4 percent of installed total performance (down from 33.6 percent). HP lost the second place in this category to Cray. HP went down to 15.6 percent from 20.4 percent, while Cray increased to 19.1 percent from 14.8 percent.
· In the system category, Cray, SGI, and Dell follow with 5.8 percent, 4.4 percent and 4.0 percent respectively.
· In the performance category, the manufacturers with more than 5 percent are: NUDT which engineered the Nos.1 and 12 systems (7.1 percent of performance) and SGI (5.7 percent).
· HP (137) and IBM (136) together sold 273 out of 281 systems at commercial and industrial customers and have had this important market segment clearly cornered for some time now.
· The U.S. is clearly the leading consumer of HPC systems with 274 of the 500 systems (down from 282). The European share (125 systems – down from 144) is still substantially larger than the Asian share (84 systems – up from 57).
· Dominant countries in Asia are China with 41 systems (up from 24), Japan with 26 systems (up from 18), and India with 4 systems (down from five).
· In Europe, Germany and France caught up with the UK. UK dropped from the No. 1 position with now 24 systems (38 six months ago). France and Germany passed the UK and have now 26 each (up from 29 and up from 24 systems six month ago).
Highlights from the TOP50:
· The entry level into the TOP50 is at 126.5 Tflop/s
· The U.S. has a similar percentage of systems (50 percent) in the TOP50 than in the TOP500 (54.8 percent).
· China is already following with five systems (10 percent).
· Cray has passed IBM and now leads the TOP50 with 34 percent of systems and 33 percent of performance.
· No. 2 is now IBM with a share of 18 percent of systems and 17 percent of performance.
· 66 percent of systems are installed at research labs and 22 percent at universities.
· There is only a single system using Gigabit Ethernet in the TOP50.
· The average concurrency level is 64,618 cores per system – up from 49,080 cores per system six months ago and 44,338 one year ago.
All changes are from June 2010 to November 2010.
About the TOP500 List
The TOP500 list is compiled by Hans Meuer of the University of Mannheim, Germany; Erich Strohmaier and Horst Simon of NERSC/Lawrence Berkeley National Laboratory; and Jack Dongarra of the University of Tennessee, Knoxville. For more information, visit www.top500.org
Copyright (c) 2000-2009 TOP500.Org | All trademarks and copyrights on this page are owned by their respective owners.
SESSION: Plenary and Kennedy Award Speakers
EVENT TYPE: Invited Speaker
TIME: 8:30AM - 9:15AM
Speaker(s):Bill Dally
ROOM:Auditorium
ABSTRACT:Performance per Watt is the new performance. In today’s power-limited regime, GPU Computing offers significant advantages in performance and energy efficiency. In this regime, performance derives from parallelism and efficiency derives from locality. Current GPUs provide both, with up to 512 cores per chip and an explicitly-managed memory hierarchy. This talk will review the current state of GPU computing and discuss how we plan to address the challenges of ExaScale computing. Achieving ExaFLOPS of sustained performance in a 20MW power envelope requires significant power reduction beyond what will be provided by technology scaling. Efficient processor design along with aggressive exploitation of locality is expected to address this power challenge. A focus on vertical rather than horizontal locality simplifies many issues including load balance, placement, and dynamic workloads. Efficient mechanisms for communication, synchronization, and thread management will be required to achieve the strong scaling required to achieve the 1010-thread parallelism needed to sustain an ExaFLOPS on reasonable-sized problems. Resilience will be achieved through a combination of hardware mechanisms and an API that allows programs to specify when and where protection is required. Programming systems will evolve to improve programmer productivity with a global address space and global data abstractions while improving efficiency via machine independent abstractions for locality.
Speaker Details:
Bill Dally - NVIDIA/Stanford University
-------------------------------------------------------
My notes:
CUDA is the most popular language these days.
Fermi -> Kepler -> Maxwell
Core i3 + c2060 best GFLOPS/W
DARPA Exascale study (download PDF)
Today: 5GFLOPs/W
Exascale: 50GFLOPs/W
So we need to improve by a factor of 10. this is achievable: x4 by architecture change and an additional x4 by technological change