2. SC13
The 25th anniversary is shaping up to be: SCinet is
provisioning over 1 Terabit per second of bandwidth; we
have 26 conference rooms and 2 ballrooms of technical
program papers, tutorials, panels, workshops, and
posters; and this year's exhibit features over 350 of the
HPC community's leading government, academic, and
industry organizations..
3. Sunday 17-Nov-2013
• Workshops
– 4th Intl Workshop on Data-Intensive Computing in the
Clouds
– 4th Workshop on Petascale (Big) Data Analytics
• Education
– LittleFe Buildout, http://littlefe.net/
– Curriculum Workshop: Mapping CS2013 & NSF/TCPP
5. Education
Perspectives on Broadening
Engagement and Educations
in the context of Advanced
Computing: Irene Qualters
NSF
Program Responsibilities:
- Cyber-Enabled Sustainability Science and Engineering (CyberSEES)
- High Performance Computing System Acquisition
- Interdisciplinary Research in Hazards and Disasters
- Petascale Computing Resource Allocations
7. Teaching parallel programming to undergrads with hands-on experience
Rainer Keller, Hochschule fuer Technik Stuttgart -- University of Applied
Science, Germany
8.
9. Mapping CS2013 and NSF/TCPP parallel and
distributed computing recommendations
and resources to courses
http://serc.carleton.edu/csinparallel/workshops/sc13/index.html
11. Python for High Performance and
Scientific Computing (PyHPC 2013)
Talks I attended:
– NumFOCUS: A Foundation Supporting Open Scientific
Software
– Synergia: Driving Massively Parallel Particle Accelerator
Simulations with Python
– Compiling Python Modules to Native Parallel Modules
Using Pythran and OpenMP Annotations
12. Python for High Performance and
Scientific Computing (PyHPC 2013)
Links:
• PyHPC 2013 on Facebook:
https://www.facebook.com/events/17938399
8878612/
• PyHPC 2013 Slides:
www.dlr.de/sc/en/desktopdefault.aspx/tabid9001/15542_read-38237/
• PyHPC: http://pyhpc.org
• NumFocus: http://numfocus.org
13. WOLFHPC:
Workshop on
Domain-Specific
Languages and HighLevel Frameworks for
HPC
• http://hpc.pnl.gov/conf/wolf
hpc/2013/
• Keynote Speaker: Laxmikant
Kale, University of
Illinois, Urbana-Champaign
http://charm.cs.illinois.edu
What Parallel HLLs Need
Charm++
19. Invited Keynote
Genevieve Bell - The Secret Life of Data
Today Big Data is one of the hottest buzzwords in technology, but
from an anthropological perspective Big Data has been with us for
millennia, in forms such as census information collected by
ancient civilizations. The next 10 years will be shaped primarily by
new algorithms that make sense of massive and diverse datasets
and discover hidden value. Could we ignite our creativity by
looking at data from a fresh perspective? What if we designed for
data like we design for people? This talk will explore the secret life
of data from an anthropological point of view to allow us to better
understand its needs -- and its astonishing potential -- as we
embark to create a new data society.
27. CSinParallel: Using Map-Reduce to
Teach Data-Intensive Scalable
Computing Across the CS Curriculum
http://csinparallel.org
http://serc.carleton.edu/csinparallel/workshops/sc13wmr/index.html
Dick Brown, St. Olaf College
28. Thursday 21-Nov-2013
• Snow storm
• SLURM BoF
Next SLURM users meeting: 23-24/9/2014 @ Swiss National
Supercomputing Center, Switzerland
29. ACM Athena Lecturer Award
The ACM Athena Lecturer Award celebrates women researchers who have
made fundamental contributions to Computer Science. Sponsored by the
ACM, the award includes a $10000 honorarium. This year’s ACM Athena
Lecturer Award winner is
Katherine Yelick, Professor of Electrical Engineering and Computer
Sciences, University of California, Berkeley and Associate Lab Director for
Computing Sciences, Lawrence Berkeley National Laboratory.
30. The SC Test of Time Award
The SC Test of Time Award recognizes a seminal
technical paper from past SC conferences that
has transformed high performance
computing, storage, or networking. The
inaugural winner of the SC Test of Time Award is
William Pugh, emeritus Professor of Computer
Science at the University of Maryland at College
Park.
43. Awards
• The Best Paper Award went to “Enabling Highly-Scalable Remote
Memory Access Programming with MPI-3 One Sided,” written by
Robert Gerstenberger, University of Illinois at UrbanaChampaign, and Maciej Besta and Torsten Hoefler, both of ETH
Zurich.
• The Best Student Paper Award was given to “Supercomputing
with Commodity CPUs: Are Mobile SoCs Ready for HPC?” written
by Nikola Rajovic of the Barcelona Supercomputing Center.
• The ACM Gordon Bell Prize for best performance of a high
performance application went to “11 PFLOP/s Simulations of Cloud
Cavitation Collapse,” by Diego Rossinelli, Babak
Hejazialhosseini, Panagiotis Hadjidoukas and Petros
Koumoutsakos, all of ETH Zurich, Costas Bekas and Alessandro
Curioni of IBM Zurich Research Laboratory, and Steffen Schmidt
and Nikolaus Adams of Technical University Munich.
44. • The Best Poster Award was presented to “Optimizations of a
Spectral/Finite Difference Gyrokinetic Code for Improved Strong
Scaling Toward Million Cores,” by Shinya Maeyama, Yasuhiro
Idomura and Motoki Nakata of the Japan Atomic Energy Agency
and Tomohiko Watanabe, Masanori Nunami and Akihiro Ishizawa
of the National Institute for Fusion Science.
• The inaugural SC Test of Time Award was presented to William
Pugh from the University of Maryland for his seminal paper, “The
Omega Test: a fast and practical integer programming algorithm
for dependence analysis,” published in the proceedings of
Supercomputing ’91.
45. The 2013-2014 ACM Athena Lecturer, Katherine Yelick of Lawrence Berkeley National
Laboratory and the University of California, was recognized during the conference
keynote session and presented her lecture during the conference.
FLOPs/Dolar
The Student Cluster Commodity Track competition, teams were allowed to spend no
more than $2,500 and the cluster must have a 15-amp power limit. The overall winning
team of the Commodity Track was from Bentley University, Waltham, Massachusetts;
and Northeastern University, Boston.
46. The November 2013 Top500
The total combined performance of all 500
systems on the list is 250 Pflop/s. Half of the
total performance is achieved by the top 17
systems on the list, with the other half of total
performance spread among the remaining 483
systems.
47. • In all, there are 31 systems with performance greater than a petaflop/s
on the list, an increase of five compared to the June 2013 list.
• The No. 1 system, Tianhe-2, and the No. 7 system, Stampede, are using
Intel Xeon Phi processors to speed up their computational rate. The No.
2 system Titan and the No. 6 system Piz Daint are using NVIDIA GPUs to
accelerate computation.
• A total of 53 systems on the list are using accelerator/co-processor
technology, unchanged from June 2013. Thirty-eight (38) of these use
NVIDIA chips, two use ATI Radeon, and there are now 13 systems with
Intel MIC technology (Xeon Phi).
• Intel continues to provide the processors for the largest share (82.4
percent) of TOP500 systems.
• Ninety-four percent of the systems use processors with six or more cores
and 75 percent have processors with eight or more cores.
• The number of systems installed in China has now stabilized at
63, compared with 65 on the last list. China occupies the No. 2 position
as a user of HPC, behind the U.S. but ahead of Japan, UK, France, and
Germany. Due to Tianhe-2, China this year also took the No. 2 position in
the performance share, topping Japan.
• The last system on the newest list was listed at position 363 in the
previous TOP500.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61. A New Benchmark: Improved
ranking test for supercomputers to
be released by Sandia
Sandia National Laboratories researcher Mike
Heroux leads development of a new
supercomputer benchmark
High Performance Conjugate Gradient (HPCG)
4000 LOC
http://mantevo.org/
62.
63. http://green500.org/news/green500-list-november-2013
The Green 500
#1 TSUBAME-KFC-GSIC Center, Tokyo Institute of
Technology
#2 Wilkes-Cambridge University
#3 HA-PACS TCA-Center for Computational
Sciences, University of Tsukuba
#4 Piz Daint-Swiss National Supercomputing Centre
(CSCS)
#5 romeo-ROMEO HPC Center - Champagne-Ardenne
#6 TSUBAME 2.5-GSIC Center, Tokyo Institute of
Technology
#7 University of Arizona
#8 Max-Planck-Gesellschaft MPI/IPP
#9 Financial Institution
#10 CSIRO GPU Cluster-CSIRO
64. Continuing the trend from previous years, heterogeneous supercomputing systems totally
dominates the top 10 spots of the Green500. A heterogeneous system uses computational
building blocks that consist of two or more types of “computing brains.” These types of
computing brains include traditional processors (CPUs), graphics processing units
(GPUs), and co-processors. In this edition of the Green500, one system smashes through
the 4-billion floating-point operation per second (gigaflops) per watt barrier.
TSUBAME-KFC, a heterogeneous supercomputing system developed at the Tokyo Institute
of Technology (TITech) in Japan, tops the list with an efficiency of 4.5 gigaflops/watt. Each
computational node within TSUBAME-KFC consists of two Intel Ivy Bridge processors and
four NVIDIA Kepler GPUs. In fact, all systems in the top ten of the Green500 use a similar
architecture, i.e., Intel CPUs combined with NVIDIA GPUs. Wilkes, a supercomputer
housed at Cambridge University, takes the second spot. The third position is filled by the
HA-PACS TCA system at the University of Tsukuba. Of particular note, this list also sees
two petaflop systems, each capable of computing over one quadrillion operations per
second, achieve an efficiency of over 3 gigaflops/watt, namely Piz Daint at Swiss National
Supercomputing Center and TSUBAME 2.5 at Tokyo Institute of Technology. Thus, Piz
Daint is the greenest petaflop supercomputer on the Green500. As a point of
reference, Tianhe-2, the fastest supercomputer in the world according to the Top500
list, achieves an efficiency of 1.9 gigaflops/watt.
75. OpenMP 4
• http://openmp.org/wp/openmp-40-api-at-sc13/
• “OpenMP 4.0 is a big step towards increasing user productivity for
multi-and many-core programming”, says Dieter an Mey, Leader of
the HPC Team at RWTH Aachen University. “Standardizing
accelerator programming, adding task dependencies, SIMD
support, cancellation, and NUMA awareness will make OpenMP an
even more attractive parallel programming paradigm for a growing
user community.”
• “The latest OpenMP 4.0 release will provide our HPC users with a
single language for offloading computational work to Xeon Phi
coprocessors, NVIDIA GPUs, and ARM processors”, says Kent
Milfeld, Manager, HPC Performance & Architecture Group of
the Texas Advanced Computing Center. “Extending the base
of OpenMP will encourage more researchers to take advantage of
attached devices, and to develop applications that support multiple
architectures.”
76. Mentor Graphics has developed OpenACC extensions that
will be supported in mainstream GCC compilers.
78. NVIDIA Announces CUDA 6
• Unified Memory – Simplifies programming by enabling
applications to access CPU and GPU memory without the need to
manually copy data from one to the other, and makes it easier to
add support for GPU acceleration in a wide range of programming
languages.
• Drop-in Libraries – Automatically accelerates applications’ BLAS
and FFTW calculations by up to 8X by simply replacing the existing
CPU libraries with the GPU-accelerated equivalents.
• Multi-GPU Scaling – Re-designed BLAS and FFT GPU libraries
automatically scale performance across up to eight GPUs in a
single node, delivering over nine teraflops of double precision
performance per node, and supporting larger workloads than ever
before (up to 512GB). Multi-GPU scaling can also be used with the
new BLAS drop-in library.
79. Nvidia Unleashes Tesla K4
The Tesla K40 GPU accelerator has double the
memory of the Tesla K20X, until now Nvidia's
top GPU accelerator, and delivers a 40 percent
performance boost over its predecessor.
The Tesla K40 is based on Nvidia's Kepler
graphics processing architecture and sports
2,880 GPU cores supporting the graphics chip
maker's CUDA parallel programming
language. The most powerful graphics
platform Nvidia has built to date has a
whopping 12GB of GDDR5 memory, supports
the PCIe 3.0 interconnect
Abstract: Higher levels of abstractions that increase productivity can be designed by specializing them in specific ways. Domain-specific languages, interaction pattern specific languages, APGAS languages, or high-level frameworks leverage their own specializations to raise abstraction levels and increase productivity. In this talk, I will present some common support that all such higher level abstractions need, and the need to encapsulate that support in a single common substrate. In particular, the support includes automatic resource management, and other runtime adaptation support, including that for tolerating component failures or handling power/energy issues. Further, I will explore the need to interoperate and coordinate across multiple such paradigms, so that one can construct multi-paradigm applications with ease. I will illustrate the talk with my group's experience in designing multiple interaction-pattern-specific HLLs, and on interoperability among them as well with traditional message-passing paradigm of MPI. HLL=High Level LanguageHLPS=High Level Programming Systems“Is there life beyond MPI?”
Chris JohnsonSCI Institute, University of UtahSalt Lake City, UT, United States 84112Websitehttp://www.sci.utah.edu
ABSTRACT:The partitioned global address space (PGAS) programming model strikes a balance between the ease of programming due to its global address memory model and performance due to locality awareness. While developed for scalable systems, PGAS is gaining popularity due to the NUMA memory architectures on many-core chips. Some PGAS implementations include Co-Array Fortran, Chapel, UPC, X10, Phalanx, OpenShmem, Titanium and Habanero. PGAS concepts are influencing new architectural designs and are being incorporated into traditional HPC environments. This BOF will bring together developers, researchers and users for the exchange of ideas and information and to address common issues of concern.SESSION LEADER DETAILS:Tarek El-Ghazawi (Primary Session Leader) - George Washington UniversityLauren Smith (Secondary Session Leader) - US Government
ABSTRACT:Map-reduce, the cornerstone computational framework for cloud computing applications, has star appeal to draw students to the study of parallelism. Participants will carry out hands-on exercises designed for students at CS1/intermediate/advanced levels that introduce data-intensive scalable computing concepts, using WebMapReduce (WMR), a simplified open-source interface to the widely used Hadoop map-reduce programming environment, and using Hadoop itself. These hands-on exercises enable students to perform data-intensive scalable computations carried out on the most widely deployed map-reduce framework, used by Facebook, Microsoft, Yahoo, and other companies. WMR supports programming in a choice of languages (including Java, Python, C++, C#, Scheme); participants will be able to try exercises with languages of their choice. Workshop includes brief introduction to direct Hadoop programming, and information about access to cluster resources supporting WMR. Workshop materials will reside on csinparallel.org, along with WMR software. Intended audience: CS instructors. Laptop required (Windows, Mac, or Linux).
Energy aware, both Cray and Bull. IPMI and SensorsFlexLM awareHadoop awareHDF5 aware
The Omega Project and Constraint Based Analysis Techniques in High Performance ComputingWilliam PughProfessor Emeritus of Computer Science at the University of Maryland at College ParkThe Omega test paper was one of the first to suggest general use of an exact algorithm for array data dependence analysis, which is the problem of determining if two array references are aliased. Knowing this is essential to knowing which loops can be run in parallel. Array data dependence is essentially the problem of determining if a set of affine constraints have an integer solution. This problem is NP-complete, but the paper described an algorithm that was both fast in practice and always exact. More important than the fact that the Omega test was exact was that it also could use arbitrary affine constraints (as opposed to many existing algorithms which could only use constraints occurring in certain pre-defined patterns), and could produce symbolic answers, rather than just yes/no answers. This work was the foundation of the Omega project and library, which significantly expanded the capabilities of the Omega test and added to the range of problems and domains that it could be applied to. The Omega library could calculate things such as actual data flow (rather than just aliasing), analyze and represent loop transformations, calculate array sections that needed to be communicated and generate loop nests. This talk will describe both the Omega test, the context in which the paper was originally written, the Omega project and the field of constraint-based program analysis and transformation that it helped open up. http://sc13.supercomputing.org/content/omega-project-and-constraint-based-analysis-techniques-high-performance-computinghttp://www.cs.umd.edu/projects/omega/* Find Bug – Static analysis of Java codes
The Green500 List - November 2013The November 2013 release of the Green500 list was announced today at the SC|13 conference in Denver, Colorado, USA. Continuing the trend from previous years, heterogeneous supercomputing systems totally dominates the top 10 spots of the Green500. A heterogeneous system uses computational building blocks that consist of two or more types of “computing brains.” These types of computing brains include traditional processors (CPUs), graphics processing units (GPUs), and co-processors. In this edition of the Green500, one system smashes through the 4-billion floating-point operation per second (gigaflops) per watt barrier.TSUBAME-KFC, a heterogeneous supercomputing system developed at the Tokyo Institute of Technology (TITech) in Japan, tops the list with an efficiency of 4.5 gigaflops/watt. Each computational node within TSUBAME-KFC consists of two Intel Ivy Bridge processors and four NVIDIA Kepler GPUs. In fact, all systems in the top ten of the Green500 use a similar architecture, i.e., Intel CPUs combined with NVIDIA GPUs. Wilkes, a supercomputer housed at Cambridge University, takes the second spot. The third position is filled by the HA-PACS TCA system at the University of Tsukuba. Of particular note, this list also sees two petaflop systems, each capable of computing over one quadrillion operations per second, achieve an efficiency of over 3 gigaflops/watt, namely Piz Daint at Swiss National Supercomputing Center and TSUBAME 2.5 at Tokyo Institute of Technology. Thus, Piz Daint is the greenest petaflop supercomputer on the Green500. As a point of reference, Tianhe-2, the fastest supercomputer in the world according to the Top500 list, achieves an efficiency of 1.9 gigaflops/watt.This list marks a number of “firsts” for the Green500. It is the first time that a supercomputer has broken through the 4 gigaflops/watt barrier. Second, it is first time that all of the top 10 systems on the Green500 are heterogeneous systems. Third, it is the first time that the average of the measured power consumed by the systems on the Green500 dropped with respect to the previous edition of the list. “A decrease in the average measured power coupled with an overall increase in performance is an encouraging step along the trail to exascale,” noted Wu Feng of the Green500. Fourth, assuming that TSUBAME-KFC’s energy efficiency can be maintained for an exaflop system, it is the first time that an extrapolation to an exaflop supercomputer has dropped below 300 megawatts (MW), specifically 222 MW. “This 222-MW power envelope is still a long way away from DARPA’s target of an exaflop system in the 20-MW power envelope,” says Feng.Starting with this release, the Little Green500 list only includes machines with power values submitted directly to the Green500. In fact, there are more than 400 systems that have submitted directly to the Green500 over the past few years. As in previous years, the Little Green500 list has better overall efficiency than the Green500 list on average.Earlier this year, the Green500 adopted new methodologies for measuring the power of supercomputing systems and providing a more accurate representation of the energy efficiency of large-scale systems. In June 2013, the Green500 formally adopted measurement rules (a.k.a. “Level 1” measurements), developed in cooperation with the Energy-Efficient High-Performance Computing Working Group (EE HPC WG). Moreover, power-measurement methodologies with higher precision and accuracy were developed as a part of this effort (a.k.a. “Level 2” and “Level 3” measurements). With growing support and interest in the energy efficiency of large-scale computing systems, the Green500 is welcoming two more submissions at Level 2 and Level 3 than in the previous edition of the Green500 list. Of particular note, Piz Daint, the greenest petaflop supercomputer in the world, submitted the highest-quality Level 3 measurement.
התלמידים מרכיבים מערכת מבוססת מעבד אטום ואחכ מתקינים תוכנות ענן
http://cseweb.ucsd.edu/~mbtaylor/papers/taylor_landscape_ds_ieee_micro_2013.pdfMulticore scaling leads to large amounts of dark silicon.3 Across two process generations,there is a spectrum of trade-offs between frequency and core count; these includeincreasing core count by 2 but leaving frequency constant (top), and increasing frequencyby 2 but leaving core count constant (bottom). Any of these trade-off points will havelarge amounts of dark silicon.