SC13 Diary

My SC13
*
Diary
Guy Tel-Zur
* A very subjective review

SC13

The 25th anniversary is shaping up to be: SCinet is
provisioning over 1 Terabit per second of bandwidth; we
have 26 conference rooms and 2 ballrooms of technical
program papers, tutorials, panels, workshops, and
posters; and this year's exhibit features over 350 of the
HPC community's leading government, academic, and
industry organizations..

Sunday 17-Nov-2013
• Workshops
– 4th Intl Workshop on Data-Intensive Computing in the
Clouds
– 4th Workshop on Petascale (Big) Data Analytics

• Education
– LittleFe Buildout, http://littlefe.net/
– Curriculum Workshop: Mapping CS2013 & NSF/TCPP

Monday 18-Nov-2013
• Workshops
• Education

Education
Perspectives on Broadening
Engagement and Educations
in the context of Advanced
Computing: Irene Qualters
NSF
Program Responsibilities:
- Cyber-Enabled Sustainability Science and Engineering (CyberSEES)
- High Performance Computing System Acquisition
- Interdisciplinary Research in Hazards and Disasters
- Petascale Computing Resource Allocations

EduPDHPC
• Workshop:
http://cs.gsu.edu/~tcpp/curriculum/?q=edupdhpc

• Program:
http://cs.gsu.edu/~tcpp/curriculum/?q=EduPDHPC13_Technical_Program

• Talks I attended:
– A Curricular Experience With Parallel
Computational Thinking: A Four Years Journey
Edusmildo Orozco et. al.
– …and see next two slides

Teaching parallel programming to undergrads with hands-on experience
Rainer Keller, Hochschule fuer Technik Stuttgart -- University of Applied
Science, Germany

Mapping CS2013 and NSF/TCPP parallel and
distributed computing recommendations
and resources to courses

http://serc.carleton.edu/csinparallel/workshops/sc13/index.html

http://cs.gsu.edu/~tcpp/curriculum/sites/default/files/xsede_overview.pdf

Python for High Performance and
Scientific Computing (PyHPC 2013)

Talks I attended:
– NumFOCUS: A Foundation Supporting Open Scientific
Software
– Synergia: Driving Massively Parallel Particle Accelerator
Simulations with Python
– Compiling Python Modules to Native Parallel Modules
Using Pythran and OpenMP Annotations

Python for High Performance and
Scientific Computing (PyHPC 2013)
Links:
• PyHPC 2013 on Facebook:
https://www.facebook.com/events/17938399
8878612/
• PyHPC 2013 Slides:
www.dlr.de/sc/en/desktopdefault.aspx/tabid9001/15542_read-38237/
• PyHPC: http://pyhpc.org
• NumFocus: http://numfocus.org

WOLFHPC:
Workshop on
Domain-Specific
Languages and HighLevel Frameworks for
HPC
• http://hpc.pnl.gov/conf/wolf
hpc/2013/
• Keynote Speaker: Laxmikant
Kale, University of
Illinois, Urbana-Champaign
http://charm.cs.illinois.edu
What Parallel HLLs Need

Charm++

Tuesday 19-Nov-2013
• Conference Opening
• Awards
• Invited Keynote

Invited Keynote
Genevieve Bell - The Secret Life of Data
Today Big Data is one of the hottest buzzwords in technology, but
from an anthropological perspective Big Data has been with us for
millennia, in forms such as census information collected by
ancient civilizations. The next 10 years will be shaped primarily by
new algorithms that make sense of massive and diverse datasets
and discover hidden value. Could we ignite our creativity by
looking at data from a fresh perspective? What if we designed for
data like we design for people? This talk will explore the secret life
of data from an anthropological point of view to allow us to better
understand its needs -- and its astonishing potential -- as we
embark to create a new data society.

Wednesday 20-Nov-2013
• My agenda
– 2 keynotes
– 2 invited speakers
– PGAS BoF
– Education Map-Reduce
– The Exhibition

Warren Washington: Climate Earth System Modeling for the IPCC
Sixth Assessment Report (AR6): Higher Resolution and Complexity

Saul Perlmutter:
Data, Computation, and the Fate of
the Universe

Invited Talks
• The Transformative
Impact of Computation
in a Data-Driven World

• Europe's Place in a
Global Race

CSinParallel: Using Map-Reduce to
Teach Data-Intensive Scalable
Computing Across the CS Curriculum
http://csinparallel.org
http://serc.carleton.edu/csinparallel/workshops/sc13wmr/index.html

Dick Brown, St. Olaf College

Thursday 21-Nov-2013
• Snow storm
• SLURM BoF

Next SLURM users meeting: 23-24/9/2014 @ Swiss National
Supercomputing Center, Switzerland

ACM Athena Lecturer Award

The ACM Athena Lecturer Award celebrates women researchers who have
made fundamental contributions to Computer Science. Sponsored by the
ACM, the award includes a $10000 honorarium. This year’s ACM Athena
Lecturer Award winner is
Katherine Yelick, Professor of Electrical Engineering and Computer
Sciences, University of California, Berkeley and Associate Lab Director for
Computing Sciences, Lawrence Berkeley National Laboratory.

The SC Test of Time Award
The SC Test of Time Award recognizes a seminal
technical paper from past SC conferences that
has transformed high performance
computing, storage, or networking. The
inaugural winner of the SC Test of Time Award is
William Pugh, emeritus Professor of Computer
Science at the University of Maryland at College
Park.

Awards
• The Best Paper Award went to “Enabling Highly-Scalable Remote
Memory Access Programming with MPI-3 One Sided,” written by
Robert Gerstenberger, University of Illinois at UrbanaChampaign, and Maciej Besta and Torsten Hoefler, both of ETH
Zurich.
• The Best Student Paper Award was given to “Supercomputing
with Commodity CPUs: Are Mobile SoCs Ready for HPC?” written
by Nikola Rajovic of the Barcelona Supercomputing Center.
• The ACM Gordon Bell Prize for best performance of a high
performance application went to “11 PFLOP/s Simulations of Cloud
Cavitation Collapse,” by Diego Rossinelli, Babak
Hejazialhosseini, Panagiotis Hadjidoukas and Petros
Koumoutsakos, all of ETH Zurich, Costas Bekas and Alessandro
Curioni of IBM Zurich Research Laboratory, and Steffen Schmidt
and Nikolaus Adams of Technical University Munich.

• The Best Poster Award was presented to “Optimizations of a
Spectral/Finite Difference Gyrokinetic Code for Improved Strong
Scaling Toward Million Cores,” by Shinya Maeyama, Yasuhiro
Idomura and Motoki Nakata of the Japan Atomic Energy Agency
and Tomohiko Watanabe, Masanori Nunami and Akihiro Ishizawa
of the National Institute for Fusion Science.
• The inaugural SC Test of Time Award was presented to William
Pugh from the University of Maryland for his seminal paper, “The
Omega Test: a fast and practical integer programming algorithm
for dependence analysis,” published in the proceedings of
Supercomputing ’91.

The 2013-2014 ACM Athena Lecturer, Katherine Yelick of Lawrence Berkeley National
Laboratory and the University of California, was recognized during the conference
keynote session and presented her lecture during the conference.

FLOPs/Dolar
The Student Cluster Commodity Track competition, teams were allowed to spend no
more than $2,500 and the cluster must have a 15-amp power limit. The overall winning
team of the Commodity Track was from Bentley University, Waltham, Massachusetts;
and Northeastern University, Boston.

The November 2013 Top500
The total combined performance of all 500
systems on the list is 250 Pflop/s. Half of the
total performance is achieved by the top 17
systems on the list, with the other half of total
performance spread among the remaining 483
systems.

• In all, there are 31 systems with performance greater than a petaflop/s
on the list, an increase of five compared to the June 2013 list.
• The No. 1 system, Tianhe-2, and the No. 7 system, Stampede, are using
Intel Xeon Phi processors to speed up their computational rate. The No.
2 system Titan and the No. 6 system Piz Daint are using NVIDIA GPUs to
accelerate computation.
• A total of 53 systems on the list are using accelerator/co-processor
technology, unchanged from June 2013. Thirty-eight (38) of these use
NVIDIA chips, two use ATI Radeon, and there are now 13 systems with
Intel MIC technology (Xeon Phi).
• Intel continues to provide the processors for the largest share (82.4
percent) of TOP500 systems.
• Ninety-four percent of the systems use processors with six or more cores
and 75 percent have processors with eight or more cores.
• The number of systems installed in China has now stabilized at
63, compared with 65 on the last list. China occupies the No. 2 position
as a user of HPC, behind the U.S. but ahead of Japan, UK, France, and
Germany. Due to Tianhe-2, China this year also took the No. 2 position in
the performance share, topping Japan.
• The last system on the newest list was listed at position 363 in the
previous TOP500.

A New Benchmark: Improved
ranking test for supercomputers to
be released by Sandia
Sandia National Laboratories researcher Mike
Heroux leads development of a new
supercomputer benchmark

High Performance Conjugate Gradient (HPCG)
4000 LOC

http://mantevo.org/

http://green500.org/news/green500-list-november-2013

The Green 500
#1 TSUBAME-KFC-GSIC Center, Tokyo Institute of
Technology
#2 Wilkes-Cambridge University
#3 HA-PACS TCA-Center for Computational
Sciences, University of Tsukuba
#4 Piz Daint-Swiss National Supercomputing Centre
(CSCS)
#5 romeo-ROMEO HPC Center - Champagne-Ardenne
#6 TSUBAME 2.5-GSIC Center, Tokyo Institute of
Technology
#7 University of Arizona
#8 Max-Planck-Gesellschaft MPI/IPP
#9 Financial Institution
#10 CSIRO GPU Cluster-CSIRO

Continuing the trend from previous years, heterogeneous supercomputing systems totally
dominates the top 10 spots of the Green500. A heterogeneous system uses computational
building blocks that consist of two or more types of “computing brains.” These types of
computing brains include traditional processors (CPUs), graphics processing units
(GPUs), and co-processors. In this edition of the Green500, one system smashes through
the 4-billion floating-point operation per second (gigaflops) per watt barrier.
TSUBAME-KFC, a heterogeneous supercomputing system developed at the Tokyo Institute
of Technology (TITech) in Japan, tops the list with an efficiency of 4.5 gigaflops/watt. Each
computational node within TSUBAME-KFC consists of two Intel Ivy Bridge processors and
four NVIDIA Kepler GPUs. In fact, all systems in the top ten of the Green500 use a similar
architecture, i.e., Intel CPUs combined with NVIDIA GPUs. Wilkes, a supercomputer
housed at Cambridge University, takes the second spot. The third position is filled by the
HA-PACS TCA system at the University of Tsukuba. Of particular note, this list also sees
two petaflop systems, each capable of computing over one quadrillion operations per
second, achieve an efficiency of over 3 gigaflops/watt, namely Piz Daint at Swiss National
Supercomputing Center and TSUBAME 2.5 at Tokyo Institute of Technology. Thus, Piz
Daint is the greenest petaflop supercomputer on the Green500. As a point of
reference, Tianhe-2, the fastest supercomputer in the world according to the Top500
list, achieves an efficiency of 1.9 gigaflops/watt.

Various new tools, products and
other issues that I came across

OpenMP 4
• http://openmp.org/wp/openmp-40-api-at-sc13/
• “OpenMP 4.0 is a big step towards increasing user productivity for
multi-and many-core programming”, says Dieter an Mey, Leader of
the HPC Team at RWTH Aachen University. “Standardizing
accelerator programming, adding task dependencies, SIMD
support, cancellation, and NUMA awareness will make OpenMP an
even more attractive parallel programming paradigm for a growing
user community.”
• “The latest OpenMP 4.0 release will provide our HPC users with a
single language for offloading computational work to Xeon Phi
coprocessors, NVIDIA GPUs, and ARM processors”, says Kent
Milfeld, Manager, HPC Performance & Architecture Group of
the Texas Advanced Computing Center. “Extending the base
of OpenMP will encourage more researchers to take advantage of
attached devices, and to develop applications that support multiple
architectures.”

Mentor Graphics has developed OpenACC extensions that
will be supported in mainstream GCC compilers.

AWS Launches ‘Ivy Bridge’-backed
EC2 Instance Type

NVIDIA Announces CUDA 6
• Unified Memory – Simplifies programming by enabling
applications to access CPU and GPU memory without the need to
manually copy data from one to the other, and makes it easier to
add support for GPU acceleration in a wide range of programming
languages.
• Drop-in Libraries – Automatically accelerates applications’ BLAS
and FFTW calculations by up to 8X by simply replacing the existing
CPU libraries with the GPU-accelerated equivalents.
• Multi-GPU Scaling – Re-designed BLAS and FFT GPU libraries
automatically scale performance across up to eight GPUs in a
single node, delivering over nine teraflops of double precision
performance per node, and supporting larger workloads than ever
before (up to 512GB). Multi-GPU scaling can also be used with the
new BLAS drop-in library.

Nvidia Unleashes Tesla K4
The Tesla K40 GPU accelerator has double the
memory of the Tesla K20X, until now Nvidia's
top GPU accelerator, and delivers a 40 percent
performance boost over its predecessor.
The Tesla K40 is based on Nvidia's Kepler
graphics processing architecture and sports
2,880 GPU cores supporting the graphics chip
maker's CUDA parallel programming
language. The most powerful graphics
platform Nvidia has built to date has a
whopping 12GB of GDDR5 memory, supports
the PCIe 3.0 interconnect

1.21 petaFLOPS(RPeak), 156,000-core CycleCloud HPC
runs 264 years of Materials Science

SDSC Uses Meteor Raspberry Pi
Cluster to Teach Parallel Computing

zSpace Immersive, 3D Display
Technology

https://www.youtube.com/watch?v=pw_n58fUu-c
zspace.com

Dark Silicon
A LANDSCAPE OF THE NEW DARK SILICON DESIGN REGIME,
Michael B. Taylor. University of California, San Diego

Petaflops of Xeon Phi in a Rack
by RSC

SC13 Diary

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à SC13 Diary

Similaire à SC13 Diary (20)

Dernier

Dernier (20)

SC13 Diary

Notes de l'éditeur