In this deck from the UK HPC Conference, Venkatesh Kannan from Irish Centre for High-End Computing (ICHEC) presents: Accelerating Research and Enterprise Solutions by Bridging HPC and AI.
"The presentation will highlight the need to address the symbiotic relationship between HPC and AI at different levels - technology development, education & training, and policy making - in order to enable the adoption and accelerating the development of AI solutions by the research and enterprise communities. A number of efforts and projects that are undertaken at the Irish Centre for High-End Computing towards enabling and achieving this in the Irish and European context will be presented."
Learn more: https://www.ichec.ie/
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Accelerating Research and Enterprise Solutions by Bridging HPC and AI
1. Accelerating Research & Enterprise Solutions
by Bridging HPC & AI
Venkatesh Kannan
Technical Manager
Irish Centre for High-End Computing (ICHEC)
2. Roles of ICHEC
• National Supercomputing Centre in Ireland
• Co-funded by two Irish Government departments
• Offices in Dublin and Galway
• National HPC Service to Irish academic and research organisations
• Engagements with public and private sector
• EuroHPC Competence Centre for Ireland
• Focus on Performance Engineering and Energy-efficient HPC applications
• Enterprise Accelerator Programme
• Bridging “technology makers” and “technology takers” in the context of HPC, BD, ML
• Engagements with Barcelona Supercompting Centre (upcoming)
• Pre-exascale system and multi-petaflop ARM testbed
2
3. Domains of Expertise at ICHEC
4
Performance
engineering
Energy-efficient
exascale computing
Energy-efficient
edge computing Efficient DL
model training
Efficient DL
inference
Climate research and
weather modelling
Geophysics and
seismic data
Quantum
computing
Training for
HPC, DS & ML
Heterogeneous
parallelisation
Earth
observation
Data management
and science
Health data
management
Satellite and
UAV datasets
4. Towards Exascale – Energy Efficiency
● R&D on weather and climate algorithms (“Dwarfs”)
● Extreme-scale heterogeneous HPC platforms
● Minimise time- and cost-to-solution in weather
and climate services
5
● Improve energy-efficiency of HPC
applications on extreme-scale platforms
● Exploit dynamic behaviour and resource
requirements of HPC applications
● Static and dynamic auto-tuning of
parameters in HPC stack (hardware, system-
software and application)
5. Environmental Sciences
● HPC and HPDA for environmental modelling
○ Weather and climate forecasting, Earth System modelling
● Earth Observation: data-centric knowledge discovery
○ Confluence of Open Data policy and Open Data Science technologies
○ EO data is covered under Open Data policy
○ ICHEC prepares EO data sets for ready use by research and enterprise organisations
○ ICHEC develops platforms and tools to prepare AI-Ready EO datasets
● ESA Environmental Validation Data Centre (EVDC) with Skytek Ltd. and ESA
○ A central, long-term repository in Europe for archiving and exchanging data for validation of
atmospheric composition products
○ Tool to monitor the quality and availability of the data provided by the data acquisition teams
contracted by ESA
6
6. Earth Observation – Satellite Platform (SPÉir)
● Satellite Platform for Éirann (Ireland) created and operated by ICHEC
● Collates & archives Sentinel satellite data for Ireland and Northern Atlantic
● Provides free and open access to data archives
● Provides data and processing services
● Develops a user base for satellite data in public agencies, academia and
commercial organisations
● Data archive accumulates daily over Ireland
○ Sentinel-1 98.9 TB
○ Sentinel-2 203.4 TB
○ Sentinel-3 79.3 TB
○ Sentinel-5P 155.1 TB (globally)
7
7. First Solar Energy Maps for Ireland
Environmental Sciences – Climate Research
• Simulating global climate change using the EC-Earth model
• Resulting datasets comprise Ireland’s contribution to CMIP6 and IPCC AR6
reports
• All datasets hosted by the ICHEC and shared with the international
community
• Current data storage requirements: ~1 PB, ~5 PB over the next 2-3 years
• Global datasets dynamically downscaled using Regional Climate
Models (RCMs)
• Provides high-resolution (~3.8km) projections of climate change in Ireland
• Current data storage requirements: 0.75 PB, ~2 PB over the next 2-3 years
• Historical climate of Ireland simulated at very high spatial
resolution (~1.5km) using Regional Climate Models
• Datasets utilised in sectors including agricultural, public health, energy
(wind, wave and solar), insurance, socio-economic planning and
fundamental studies in observed climate change trends and variability
• Current data storage requirements: 0.75 PB, ~4 PB over the next 2-3 years
8
9. Health Services – Data & Compute Platform (DASSL)
Health Research Board (HRB) Data Access Storage Sharing and Linking
10
10. National Statistics – Big Data Management
● Partnership with Irish Central Statistics Office (CSO) and
United Nations Economic Commission for Europe
(UNECE)
● Data Analytics 'Sandbox' for UNECE program on Big
Data in Official Statistics
● >25 global organisations, including OECD and Eurostat,
subscribed to and used the sandbox
● Implement and investigate HPDA and policies for data
privacy and ethics
● Shared platform for national statisticians on which to
train collaboratively
● Upskilling public service staff in data science and
analytics
11
11. Geophysics – Big Data & Compute Acceleration
• ExSeisDat – Extreme-scale Seismic Data
• Focus: big data, complex data formats, complex I/O patterns, and odd I/O platforms
• Intuitive interface for parallel I/O
• Clean, extensible, lower-level library API
• Implicit workflow API
• Auto parallelisation, optimisation, caching
• I/O hardware and protocols
• Infinite Memory Engine (IME from DDN)
• Nvidia GPU + GPUDirect RDMA
• NVMe
• I/O drivers
• Adding custom filesystem drivers to Lustre
• Phobos to Amazon S3
12
12. Machine Learning & Compute Acceleration
• Automation of scientific workflow from microscopy imaging to
analysis results
• Data management, Reproducible Research, Computer Vision, Matlab, R
• DL-based parasite detection/identification from microscopy
imaging
• Deep Learning YOLOv3 models on GPU with PyTorch
• Financial data exploration and data analysis toolbox development
• Machine Learning with unsupervised clustering to build predictive models, R
• Code/models review, benchmarking performance gains from
cloud-based GPU resources
• Deep Learning for NLP on multi-GPU platform with TensorFlow and PyTorch
• Extraction of words of interest from scripts
• Deep Learning models for NLP-based translation
13