SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Introducing HPC with a Raspberry Pi Cluster
Colin Sauzé <cos@aber.ac.uk>
Research Software Engineer
Super Computing Wales Project
Aberystwyth University
A practical use of and good excuse to build Raspberry Pi Clusters
Overview
● About Me
● Inspirations
● Why teach HPC with Raspberry Pi?
● My Raspberry Pi cluster
● Experiences from teaching
● Future Work
About Me
● Research Software Engineer with
Supercomputing Wales project
– 4 university partnership to supply
HPC systems
– Two physical HPCs
● PhD in Robotics
– Experience with Linux on single
board computers
– Lots of Raspberry Pi projects
Inspiration #1: Los Alamos National Laboratory
● 750 node cluster
● Test system for software
development
● Avoid tying up the real cluster
Inspiration #2: Wee Archie/Archlet
● EPCC’s Raspberry Pi Cluster
● Archie: 18x Raspberry Pi 2’s
(4 cores each)
● Archlet: smaller 4 or 5 node
clusters.
● Used for outreach demos.
● Setup instructions:
https://github.com/EPCCed/w
ee_archlet
Image from
https://raw.githubusercontent.com/EPCCed/wee_archlet/master/images/IMG_20170210_132818620.jpg
Inspiration #3: Swansea’s Raspberry Pi Cluster
● 16x Raspberry Pi 3s
● CFD demo using a Kinect
sensor
● Demoed at the Swansea
Festival of Science 2018
Why Teach with a Raspberry Pi cluster?
● Avoid loading real clusters doing actual research
– Less fear from learners that they might break something
● Resource limits more apparent
● More control over the environment
● Hardware less abstract
● No need to have accounts on a real HPC
My Cluster
● “Tweety Pi”
– 10x Raspberry Pi model B
version 1s
– 1x Raspberry Pi 3 as
head/login node
– Raspbian Stretch
● Head node acts as WiFi
access point
– Internet via phone or laptop
Demo Software
● British Science Week 2019
– Simple Pi with Monte Carlo methods demo
– MPI based
– GUI to control how many jobs launch and show
queuing
● Swansea CFD demo
– Needs more compute power
– 16x Raspberry Pi 3 vs 10x Raspberry Pi 1
● Wee Archie/Archlet Demos
– Many demos available
●
I only found this recently
– https://github.com/EPCCed/wee_archie
Making a realistic HPC environment
● MPICH
● Slurm
● Quotas on home directories
● NFS mounted home directories
● Software modules
● Network booting compute nodes
Network booting hack
● No PXE boot support on original Raspberry Pi (or Raspberry Pi
B+ and 2)
● Kernel + bootloader on SD card
● Root filesystem on NFS
– Cmdline.txt contains:
● console=tty1 root=/dev/nfs 
nfsroot=10.0.0.10:/nfs/node_rootfs,vers=3 ro ip=dhcp 
elevator=deadline rootwait
● SD cards can be identical, small 50mb image, easy to replace
Teaching Materials
● Based on Introduction to HPC with Super Computing Wales carpentry style
lesson:
– What is an HPC?
– Logging in
– Filesystems and transferring data
– Submitting/monitoring jobs with Slurm
– Profiling
– Parallelising code, Amdahl’s law
– MPI
– HPC Best Practice
Experiences from Teaching – STFC Summer
School
● New PhD students in solar
physics
– Not registered at universities
yet, no academic accounts
● 15 people each time
– 1st time using HPC for many
– Most had some Unix
experience
● Subset of Super Computing
Wales introduction to HPC
carpentry lesson
Feedback
● Very Positive
●
A lot seemed to enjoy playing around with SSH/SCP
– First time using a remote shell for some
– Others more adventurous than they might have been on a real HPC
● Main complaint was lack of time (only 1.5 hours)
– Only got as far as covering basic job submission
– Quick theoretical run through of MPI and Amdahl’s law
– Probably have 3-4 hours of material
●
Queuing became very apparent
– 10 nodes, 15 users
– “watch squeue” running on screen during practical parts
Problems
● Slurm issues on day 1
– Accidentally overwrote a system user when creating accounts
● WiFi via Laptop/phone slow
– When users connect to the cluster its their internet connection too
– Relied on this for access to course notes
Experiences from teaching – Supercomputing
Wales Training
● Approximately 10 people
– Mix of staff and research students
– Mixed experience levels
– All intending to use a real HPC
● Simultaneously used Raspberry Pi and real HPC
– Same commands run on both
● Useful backup system for those with locked accounts
● Feedback good
– Helped make HPC more tangible
Future Work
● Configuration management tool
(Ansible/Chef/Puppet/Salt etc) instead
of script for configuration
● CentOS/Open HPC stack instead of
Raspbian
● Public engagement demo which
focuses on our research
– Analysing satellite imagery
– Simulate the monsters from MonsterLab
(https://monster-lab.org/)
More Information
● Setup instructions and scripts -
https://github.com/colinsauze/pi_cluster
● Teaching material -
https://github.com/SCW-Aberystwyth/Introduction-to-HPC-with-
RaspberryPi
● Email me: cos@aber.ac.uk

Contenu connexe

Tendances

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 

Tendances (20)

Lustre Best Practices
Lustre Best Practices Lustre Best Practices
Lustre Best Practices
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
Welcome to the 2016 HPC Advisory Council Switzerland Conference
Welcome to the 2016 HPC Advisory Council Switzerland ConferenceWelcome to the 2016 HPC Advisory Council Switzerland Conference
Welcome to the 2016 HPC Advisory Council Switzerland Conference
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
ARM and Machine Learning
ARM and Machine LearningARM and Machine Learning
ARM and Machine Learning
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 

Similaire à Introducing HPC with a Raspberry Pi Cluster

Spark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng ChenSpark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng Chen
Guancheng (G.C.) Chen
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic
Perforce
 

Similaire à Introducing HPC with a Raspberry Pi Cluster (20)

uWSGI - Swiss army knife for your Python web apps
uWSGI - Swiss army knife for your Python web appsuWSGI - Swiss army knife for your Python web apps
uWSGI - Swiss army knife for your Python web apps
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined Infrastructure
 
Spark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng ChenSpark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng Chen
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
History of Computer Systems - Why we are doing it that way
History of Computer Systems - Why we are doing it that wayHistory of Computer Systems - Why we are doing it that way
History of Computer Systems - Why we are doing it that way
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
Heavy duty Abaqus structural analysis using HPC in the cloud
Heavy duty Abaqus structural analysis using HPC in the cloudHeavy duty Abaqus structural analysis using HPC in the cloud
Heavy duty Abaqus structural analysis using HPC in the cloud
 
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes][BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic
 
New Process/Thread Runtime
New Process/Thread Runtime	New Process/Thread Runtime
New Process/Thread Runtime
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
General Learning.pptx
General Learning.pptxGeneral Learning.pptx
General Learning.pptx
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Cytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis ToolsCytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis Tools
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 

Plus de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
inside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Introducing HPC with a Raspberry Pi Cluster

  • 1. Introducing HPC with a Raspberry Pi Cluster Colin Sauzé <cos@aber.ac.uk> Research Software Engineer Super Computing Wales Project Aberystwyth University A practical use of and good excuse to build Raspberry Pi Clusters
  • 2. Overview ● About Me ● Inspirations ● Why teach HPC with Raspberry Pi? ● My Raspberry Pi cluster ● Experiences from teaching ● Future Work
  • 3. About Me ● Research Software Engineer with Supercomputing Wales project – 4 university partnership to supply HPC systems – Two physical HPCs ● PhD in Robotics – Experience with Linux on single board computers – Lots of Raspberry Pi projects
  • 4. Inspiration #1: Los Alamos National Laboratory ● 750 node cluster ● Test system for software development ● Avoid tying up the real cluster
  • 5. Inspiration #2: Wee Archie/Archlet ● EPCC’s Raspberry Pi Cluster ● Archie: 18x Raspberry Pi 2’s (4 cores each) ● Archlet: smaller 4 or 5 node clusters. ● Used for outreach demos. ● Setup instructions: https://github.com/EPCCed/w ee_archlet Image from https://raw.githubusercontent.com/EPCCed/wee_archlet/master/images/IMG_20170210_132818620.jpg
  • 6. Inspiration #3: Swansea’s Raspberry Pi Cluster ● 16x Raspberry Pi 3s ● CFD demo using a Kinect sensor ● Demoed at the Swansea Festival of Science 2018
  • 7. Why Teach with a Raspberry Pi cluster? ● Avoid loading real clusters doing actual research – Less fear from learners that they might break something ● Resource limits more apparent ● More control over the environment ● Hardware less abstract ● No need to have accounts on a real HPC
  • 8. My Cluster ● “Tweety Pi” – 10x Raspberry Pi model B version 1s – 1x Raspberry Pi 3 as head/login node – Raspbian Stretch ● Head node acts as WiFi access point – Internet via phone or laptop
  • 9. Demo Software ● British Science Week 2019 – Simple Pi with Monte Carlo methods demo – MPI based – GUI to control how many jobs launch and show queuing ● Swansea CFD demo – Needs more compute power – 16x Raspberry Pi 3 vs 10x Raspberry Pi 1 ● Wee Archie/Archlet Demos – Many demos available ● I only found this recently – https://github.com/EPCCed/wee_archie
  • 10. Making a realistic HPC environment ● MPICH ● Slurm ● Quotas on home directories ● NFS mounted home directories ● Software modules ● Network booting compute nodes
  • 11. Network booting hack ● No PXE boot support on original Raspberry Pi (or Raspberry Pi B+ and 2) ● Kernel + bootloader on SD card ● Root filesystem on NFS – Cmdline.txt contains: ● console=tty1 root=/dev/nfs  nfsroot=10.0.0.10:/nfs/node_rootfs,vers=3 ro ip=dhcp  elevator=deadline rootwait ● SD cards can be identical, small 50mb image, easy to replace
  • 12. Teaching Materials ● Based on Introduction to HPC with Super Computing Wales carpentry style lesson: – What is an HPC? – Logging in – Filesystems and transferring data – Submitting/monitoring jobs with Slurm – Profiling – Parallelising code, Amdahl’s law – MPI – HPC Best Practice
  • 13. Experiences from Teaching – STFC Summer School ● New PhD students in solar physics – Not registered at universities yet, no academic accounts ● 15 people each time – 1st time using HPC for many – Most had some Unix experience ● Subset of Super Computing Wales introduction to HPC carpentry lesson
  • 14. Feedback ● Very Positive ● A lot seemed to enjoy playing around with SSH/SCP – First time using a remote shell for some – Others more adventurous than they might have been on a real HPC ● Main complaint was lack of time (only 1.5 hours) – Only got as far as covering basic job submission – Quick theoretical run through of MPI and Amdahl’s law – Probably have 3-4 hours of material ● Queuing became very apparent – 10 nodes, 15 users – “watch squeue” running on screen during practical parts
  • 15. Problems ● Slurm issues on day 1 – Accidentally overwrote a system user when creating accounts ● WiFi via Laptop/phone slow – When users connect to the cluster its their internet connection too – Relied on this for access to course notes
  • 16. Experiences from teaching – Supercomputing Wales Training ● Approximately 10 people – Mix of staff and research students – Mixed experience levels – All intending to use a real HPC ● Simultaneously used Raspberry Pi and real HPC – Same commands run on both ● Useful backup system for those with locked accounts ● Feedback good – Helped make HPC more tangible
  • 17. Future Work ● Configuration management tool (Ansible/Chef/Puppet/Salt etc) instead of script for configuration ● CentOS/Open HPC stack instead of Raspbian ● Public engagement demo which focuses on our research – Analysing satellite imagery – Simulate the monsters from MonsterLab (https://monster-lab.org/)
  • 18. More Information ● Setup instructions and scripts - https://github.com/colinsauze/pi_cluster ● Teaching material - https://github.com/SCW-Aberystwyth/Introduction-to-HPC-with- RaspberryPi ● Email me: cos@aber.ac.uk