In this deck from the HPC User Forum in Detroit, Irene Qualters from NSF presents: Leadership Computing and NSF’s Computational Ecosystem.
For over three decades, NSF has been a leader in providing the computing resources our nation's researchers need to accelerate innovation," said NSF Director France Córdova. "Keeping the U.S. at the forefront of advanced computing capabilities and providing researchers across the country access to those resources are key elements in maintaining our status as a global leader in research and education. This award is an investment in the entire U.S. research ecosystem that will enable leap-ahead discoveries."
Watch the video: https://wp.me/p3RLHQ-j2X
Learn more: http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com
3. Leadership Computing Investment Represents a National
Asset
NSF Leadership Computing investments, along with complementary investments by
other agencies (DOE, NIH, DOD, NASA, etc.) and governments represent strategic assets
for advancing science and engineering
3
Blue Waters, UIUC
4. NSF Leadership Computing Support (Blue Waters/UIUC) Is
Enabling Advances in National Priorities Not Otherwise Possible
Plasma
simulations
critical to design
of smaller,
cheaper particle
accelerators
Mori (UCLA)
Modeling
relativistic
accretion
dynamics of Super
Massive Black
Hole mergers
Campanelli (RIT)
Full-scale model
of the brain
hippocampus to
understand
neurological
disorders.
Soltesz
(Stanford)
4
Space weather models
incorporating X-ray bursts and
high-energy physics associated
with intense solar activity.
Stein (Michigan State)
Blue Waters team of
application consultants
and CI experts
5. UIUC/ Blue Waters Experience → stimulating new research
collaborations to enable dramatic and unanticipated advances
Mulock Glacier, between Byrd Glacier and the McMurdo Dry Valleys. Flow
is from the polar plateau on the right to the Ross Ice Shelf on the left.
The Polar Geospatial Center (PGC) is a research facility at the University of Minnesota and funded by NSF.
• On September 4, 2018 PGC released the Reference
Elevation Model of Antarctica ( REMA) providing
the first high resolution (8-meter), high accuracy
terrain dataset of approximately 98% of
Antarctica.
• REMA is constructed from DigitalGlobe satellite
imagery licensed by the National Geospatial-
Intelligence Agency.
• REMA is generated using the open source
software developed by M.J. Noh and Ian Howat at
the Ohio State University. The images are
processed on the Blue Waters supercomputer
located at the National Center for Supercomputing
Applications at the University of Illinois at Urbana-
Champaign
7. 7
Cyberinfrastructure ecosystem to foster secure access,
adaptive capacity, and dynamic workflows by research
communities to transform computational- and data-
intensive research across all of science and engineering
Transforming the Frontiers of Science &
Society
9. NSF Research Infrastructure Investments Enable
Bold Science
9
Gravitational wave detection enabled by NSF investments in technology and people
9
Open Science Grid
ü Sustained access to multiscale Advanced Computing resources
• New intensive simulations of relativity and magnetohydrodynamics.
• Massive, parallel event searches and validation (100,000 models).
• Advanced computing resources and services sponsored by NSF, other agencies, and institutions.
ü Interoperable Networking, Data Transfer, & Workflow Systems
• Pegasus, HTCondor, Globus workflow and data transfer management
• NSF (DOE, International agencies) funded 100 Gbps upgrades enabled huge throughput gains.
ü Software Infrastructure
• Computational science advances embodied in Community Software Infrastructure, for simulations,
visualizations, workflows and data flows
NSF programs: Data Building Blocks (DIBBs), Software Infrastructure (SI2), Network Infrastructure and Engineering (CC*NIE,
IRNC), and others. Complementary Investments by other federal agencies and international entities.
10. Evolving Science, CI Landscapes
Evolving Science/Engineering Landscape
§ Large scales, high-resolution, multi-scale, multi-
physics simulations
§ Emerging data-driven (ML-based) models
§ Streaming data from observatories, instruments
§ Complex, dynamic workflows
§ Heightened emphasis on robust results
(transparency, credibility, correctness, security, ...)
Evolving Technology Landscape
§ Extreme scales / pervasive computing and data
§ Diverse/ disruptive technologies increasing
§ Role of software platforms in taming complexity
§ High throughput/low-latency networks
§ Increasing role of clouds, multiclouds, hybrid
environments
New agility, reuse, collaborations needed
in Cyberinfrastructure ecosystem and its
components
Instrument, Observatories,
Experimental Facilities
End-to-end Workflows
13. Blue Waters/UIUC
2018 2013 2014 2015 2016 2017 2019 2020 2021 2022 2023 2024 2025 2026
Leadership
Computing
Planning for the Future: OAC-Funded
Computing Ecosystem
13
XSEDE 2 Coordinated User Services, Education, Outreach
XD Metrics Service
Services
Open Science Grid
XSEDE
Leadership HPC Planning Leadership Class Phase 1 System
Planning for
the Future CI
Ecosystem
Community-driven planning and actions
Stampede / UT Austin Stampede 2 / UT Austin
Wrangler / UT Austin
Bridges / CMU & PSC
Jetstream / Indiana U
Innovative
HPC
Comet / UCSD Key
Large-scale computation
Long-tail and high-throughput
Data Intensive
Cloud
Services
Leadership HPC Planning Phase 1: 2 to 3x Time-to-solution Improvement Phase 2: > 20x
LC Phase 2 System
14. Towards a Leadership-Class Computing Facility - Phase 1
(NSF 17-558)
Solicitation calls for Phase 1 system that:
• Advances science and engineering at the frontiers
- A high-capability system with at least two- to three-fold time-to-solution performance improvement over Blue
Waters.
- Supports the known portfolio of scientific applications and workflows requiring leadership-class computing
capabilities, as well as for future frontier applications exploiting the confluence of simulation and data analysis.
- Supports computational requirements of long-lived scientific facilities.
• Support innovative collaborations and coordination
• Leverages complementary investments in the national HPC ecosystem, such as, the large academic institutional
investments as well as additional investments by NSF and other federal agencies.
• Extensive broadening participation activities through education and industry outreach, as well as other partnerships
such as interagency and international collaborations are encouraged if possible.
• Plans for Phase 2 design that will lead to a leadership-class computing facility with ten fold performance
improvement over the Phase 1 system. 14
15. NSF 17-558: Towards a Leadership-Class Computing
Facility – Phase 1
Resolution
RESOLVED, that the National Science Board authorizes the Director at her discretion
to make an award, OAC-1818253, to the Texas Advanced Computing Center (TACC) at
the University of Texas at Austin for the acquisition of the system described in
proposal “Computation for the Endless Frontier,” in an amount not to exceed
$60,000,000 for a period of 60 months. Pending appropriate approval associated with
NSF MREFC policies, an additional amount not to exceed $8 million may be made
available to TACC in the form of supplemental funding to this award to advance the
design of the Phase 2 leadership-class system.
15
16. FRONTERA SYSTEM --- PHASE 1 LEADERSHIP COMPUTING PROJECT
„ Deploy a system in 2019 for the largest problems scientists and
engineers currently face.
„ Support and operate this system for 5 years.
„ Plan a potential phase 2 system, with 10x the capabilities, for the
future challenges scientists will face.
16
17. THE TEAM
„ Operations: TACC, Ohio State University, Cornell, Texas A&M
„ Science and Technology Drivers and Phase 2 Planning: Cal Tech, University
of Chicago, Cornell, UC-Davis, Georgia Tech, Princeton, Stanford
„ Vendors: DellEMC, Intel, Mellanox, DataDirect Networks, GRC, CoolIT,
NVIDIA, Amazon, Microsoft, Google
17
18. FRONTERA SYSTEM --- HARDWARE
„ Primary compute system: DellEMC and Intel
„ 35-40 PetaFlops Peak Performance
„ Interconnect: Mellanox HDR and HDR-100 links.
„ Fat Tree topology, 200Gb/s links between switches.
„ Storage: DataDirect Networks
„ 50+ PB disk, 3PB of Flash, 1.5TB/sec peak I/O rate.
„ Single Precision Compute Subsystem: Nvidia
„ Front end for data movers, workflow, API
18
19. FRONTERA SYSTEM --- ECOSYSTEM
„ Interfaces to other Cyberinfrastructure Resources:
„ Archive systems
„ Public data repositories
„ Data Transfer Software
„ Public cloud providers (Microsoft, Amazon, Google)
„ Options to publish data in the cloud, use innovative cloud services in scientific workflows, and access to
new technologies each year as we plan phase 2.
„ And coordination with the Scientific Ecosystem:
„ Partnerships with large scale instruments, software development teams,
XSEDE
„ Education and Workforce activities
19
20. FRONTERA SYSTEM --- INFRASTRUCTURE
„ Frontera will consume almost
6 Megawatts of Power at
Peak
„ Direct water cooling of
primary compute racks
(CoolIT/DellEMC)
„ Oil immersion Cooling (GRC)
„ Solar, Wind inputs.
20
TACC Machine Room Chilled Water Plant
21. Conclusion
§ Science and society are being transformed by compute and data
– an agile cyberinfrastructure ecosystem is essential
§ Rapidly changing application requirements; resource and
technology landscapes
• Our cyberinfrastructure ecosystem must efficiently evolve
§ Forward-looking approach to cyberinfrastructure ecosystem,
aimed at transforming science
22. Thanks
§ Manish Parashar, OD, NSF/CISE/OAC
§ Dan Stanzione, Assoc.VPR, ED/TACC, UT at Austin
§ Bogdan Mihaila, PD, NSF/OAC/MPS