SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Research Computing at ILRI
Alan Orth
ICT Managers Meeting, ILRI, Kenya, 5 March 2014
Where we came from (2003)
- 32 dual-core compute
nodes
- 32 * 2 != 64
- Writing MPI code is hard!
- Data storage over NFS to
“master” node
- “Rocks” cluster distro
- Revolutionary at the time!
Where we came from (2010)
- Most of the original cluster
removed
- Replaced with single
Dell PowerEdge R910
- 64 cores, 8TB storage, 128 GB
- Threading is easier* than MPI!
- Data is local
- Easier to manage!
To infinity and beyond (2013)
- A little bit back to the
“old” model
- Mixture of “thin” and
“thick” nodes
- Networked storage
- Pure CentOS
- Supermicro boxen
- Pretty exciting! --->
Primary characteristics

Computational
capacity

Data storage
Platform
- 152 compute cores
- 32* TB storage
- 700 GB RAM
- 10 GbE interconnects
- LTO-4 tape backups (LOL?)
Homogeneous computing environment

User IDs, applications, and data are available
everywhere.
Scaling out storage with GlusterFS
- Developed by Red Hat
- Abstracts backend storage (file systems,
technology, etc)
- Can do replicate, distribute,
replicate+distribute, geo-replication (off site!),
etc
- Scales “out”, not “up”
How we use GlusterFS
[aorth@hpc: ~]$ df -h
Filesystem
Size
...
wingu1:/homes
31T
wingu0:/apps
31T
wingu1:/data
31T

Used Avail Use% Mounted on
9.5T
9.5T
9.5T

21T
21T
21T

32% /home
32% /export/apps
32% /export/data

- Persistent paths for homes, data, and
applications across the cluster.
- These volumes are replicated, so essentially
application-layer RAID1
GlusterFS <3 10GbE
- Project from Lawrence Livermore National Labs (LLNL)
- Manages resources
- Users request CPU, memory, and node allocations
- Queues / prioritizes jobs, logs usage, etc
- More like an accountant than a bouncer
Topology
How we use SLURM
- Can submit “batch” jobs (long-running jobs, invoke
program many times with different variables, etc)
- Can run “interactively” (something that needs keyboard
interaction)
Make it easy for users to do the “right thing”:
[aorth@hpc: ~]$ interactive -c 10
salloc: Granted job allocation 1080
[aorth@compute0: ~]$
Managing applications
- Environment modules - http://modules.
sourceforge.net
- Dynamically load support for packages in a
user’s environment
- Makes it easy to support multiple versions,
complicated packages with $PERL5LIB,
package dependencies, etc
Managing applications
Install once, use everywhere...
[aorth@hpc: ~]$ module avail blast
blast/2.2.25+ blast/2.2.26 blast/2.2.26+ blast/2.
2.28+
[aorth@hpc: ~]$ module load blast/2.2.28+
[aorth@hpc: ~]$ which blastn
/export/apps/blast/2.2.28+/bin/blastn
Works anywhere on the cluster!
Users and Groups
- Consistent UID/GIDs across systems
- LDAP + SSSD (also from Red Hat) is a great
match
- 389 LDAP works great with CentOS
- SSSD is simpler than pam_ldap and does
caching
More information and contact

a.orth@cgiar.org
http://hpc.ilri.cgiar.org/

Contenu connexe

Tendances

HaaS: HPCC Systems as a Service – BYOD to the Cloud Party
HaaS: HPCC Systems as a Service – BYOD to the Cloud PartyHaaS: HPCC Systems as a Service – BYOD to the Cloud Party
HaaS: HPCC Systems as a Service – BYOD to the Cloud Party
HPCC Systems
 
Exadata x2 ext
Exadata x2 extExadata x2 ext
Exadata x2 ext
yangjx
 

Tendances (19)

Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
HaaS: HPCC Systems as a Service – BYOD to the Cloud Party
HaaS: HPCC Systems as a Service – BYOD to the Cloud PartyHaaS: HPCC Systems as a Service – BYOD to the Cloud Party
HaaS: HPCC Systems as a Service – BYOD to the Cloud Party
 
Exadata x2 ext
Exadata x2 extExadata x2 ext
Exadata x2 ext
 
Writing file system in CPython
Writing file system in CPythonWriting file system in CPython
Writing file system in CPython
 
Glusterfs session #9 index xlator
Glusterfs session #9   index xlatorGlusterfs session #9   index xlator
Glusterfs session #9 index xlator
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
Ansible OTC
Ansible OTCAnsible OTC
Ansible OTC
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
Rear
RearRear
Rear
 
2 db2 instance creation
2 db2 instance creation2 db2 instance creation
2 db2 instance creation
 
Mongodb backup
Mongodb backupMongodb backup
Mongodb backup
 
Php dba cache
Php dba cachePhp dba cache
Php dba cache
 
More than UI
More than UIMore than UI
More than UI
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 
IBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guruIBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guru
 
Cassandra4hadoop
Cassandra4hadoopCassandra4hadoop
Cassandra4hadoop
 
Mosix Cluster
Mosix ClusterMosix Cluster
Mosix Cluster
 

Similaire à Research computing at ILRI

12.) fabric (your next data center)
12.) fabric (your next data center)12.) fabric (your next data center)
12.) fabric (your next data center)
Jeff Green
 
EEDC - Apache Pig
EEDC - Apache PigEEDC - Apache Pig
EEDC - Apache Pig
javicid
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 

Similaire à Research computing at ILRI (20)

Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for science
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
12.) fabric (your next data center)
12.) fabric (your next data center)12.) fabric (your next data center)
12.) fabric (your next data center)
 
EEDC - Apache Pig
EEDC - Apache PigEEDC - Apache Pig
EEDC - Apache Pig
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ Home
 
ERS downscale2016
ERS downscale2016ERS downscale2016
ERS downscale2016
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
EEDC Apache Pig Language
EEDC Apache Pig LanguageEEDC Apache Pig Language
EEDC Apache Pig Language
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Eedc.apache.pig last
Eedc.apache.pig lastEedc.apache.pig last
Eedc.apache.pig last
 

Plus de ILRI

Plus de ILRI (20)

How the small-scale low biosecurity sector could be transformed into a more b...
How the small-scale low biosecurity sector could be transformed into a more b...How the small-scale low biosecurity sector could be transformed into a more b...
How the small-scale low biosecurity sector could be transformed into a more b...
 
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
 
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
 
A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...
 
Milk safety and child nutrition impacts of the MoreMilk training, certificati...
Milk safety and child nutrition impacts of the MoreMilk training, certificati...Milk safety and child nutrition impacts of the MoreMilk training, certificati...
Milk safety and child nutrition impacts of the MoreMilk training, certificati...
 
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseasesPreventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
 
Preventing preventable diseases: a 12-slide primer on foodborne disease
Preventing preventable diseases: a 12-slide primer on foodborne diseasePreventing preventable diseases: a 12-slide primer on foodborne disease
Preventing preventable diseases: a 12-slide primer on foodborne disease
 
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistancePreventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
 
Food safety research in low- and middle-income countries
Food safety research in low- and middle-income countriesFood safety research in low- and middle-income countries
Food safety research in low- and middle-income countries
 
Food safety research LMIC
Food safety research LMICFood safety research LMIC
Food safety research LMIC
 
The application of One Health: Observations from eastern and southern Africa
The application of One Health: Observations from eastern and southern AfricaThe application of One Health: Observations from eastern and southern Africa
The application of One Health: Observations from eastern and southern Africa
 
One Health in action: Perspectives from 10 years in the field
One Health in action: Perspectives from 10 years in the fieldOne Health in action: Perspectives from 10 years in the field
One Health in action: Perspectives from 10 years in the field
 
Reservoirs of pathogenic Leptospira species in Uganda
Reservoirs of pathogenic Leptospira species in UgandaReservoirs of pathogenic Leptospira species in Uganda
Reservoirs of pathogenic Leptospira species in Uganda
 
Minyoo ya mbwa
Minyoo ya mbwaMinyoo ya mbwa
Minyoo ya mbwa
 
Parasites in dogs
Parasites in dogsParasites in dogs
Parasites in dogs
 
Assessing meat microbiological safety and associated handling practices in bu...
Assessing meat microbiological safety and associated handling practices in bu...Assessing meat microbiological safety and associated handling practices in bu...
Assessing meat microbiological safety and associated handling practices in bu...
 
Ecological factors associated with abundance and distribution of mosquito vec...
Ecological factors associated with abundance and distribution of mosquito vec...Ecological factors associated with abundance and distribution of mosquito vec...
Ecological factors associated with abundance and distribution of mosquito vec...
 
Livestock in the agrifood systems transformation
Livestock in the agrifood systems transformationLivestock in the agrifood systems transformation
Livestock in the agrifood systems transformation
 
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
 
Practices and drivers of antibiotic use in Kenyan smallholder dairy farms
Practices and drivers of antibiotic use in Kenyan smallholder dairy farmsPractices and drivers of antibiotic use in Kenyan smallholder dairy farms
Practices and drivers of antibiotic use in Kenyan smallholder dairy farms
 

Dernier

Dernier (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Research computing at ILRI

  • 1. Research Computing at ILRI Alan Orth ICT Managers Meeting, ILRI, Kenya, 5 March 2014
  • 2. Where we came from (2003) - 32 dual-core compute nodes - 32 * 2 != 64 - Writing MPI code is hard! - Data storage over NFS to “master” node - “Rocks” cluster distro - Revolutionary at the time!
  • 3. Where we came from (2010) - Most of the original cluster removed - Replaced with single Dell PowerEdge R910 - 64 cores, 8TB storage, 128 GB - Threading is easier* than MPI! - Data is local - Easier to manage!
  • 4. To infinity and beyond (2013) - A little bit back to the “old” model - Mixture of “thin” and “thick” nodes - Networked storage - Pure CentOS - Supermicro boxen - Pretty exciting! --->
  • 6. Platform - 152 compute cores - 32* TB storage - 700 GB RAM - 10 GbE interconnects - LTO-4 tape backups (LOL?)
  • 7. Homogeneous computing environment User IDs, applications, and data are available everywhere.
  • 8. Scaling out storage with GlusterFS - Developed by Red Hat - Abstracts backend storage (file systems, technology, etc) - Can do replicate, distribute, replicate+distribute, geo-replication (off site!), etc - Scales “out”, not “up”
  • 9. How we use GlusterFS [aorth@hpc: ~]$ df -h Filesystem Size ... wingu1:/homes 31T wingu0:/apps 31T wingu1:/data 31T Used Avail Use% Mounted on 9.5T 9.5T 9.5T 21T 21T 21T 32% /home 32% /export/apps 32% /export/data - Persistent paths for homes, data, and applications across the cluster. - These volumes are replicated, so essentially application-layer RAID1
  • 11. - Project from Lawrence Livermore National Labs (LLNL) - Manages resources - Users request CPU, memory, and node allocations - Queues / prioritizes jobs, logs usage, etc - More like an accountant than a bouncer
  • 13. How we use SLURM - Can submit “batch” jobs (long-running jobs, invoke program many times with different variables, etc) - Can run “interactively” (something that needs keyboard interaction) Make it easy for users to do the “right thing”: [aorth@hpc: ~]$ interactive -c 10 salloc: Granted job allocation 1080 [aorth@compute0: ~]$
  • 14. Managing applications - Environment modules - http://modules. sourceforge.net - Dynamically load support for packages in a user’s environment - Makes it easy to support multiple versions, complicated packages with $PERL5LIB, package dependencies, etc
  • 15. Managing applications Install once, use everywhere... [aorth@hpc: ~]$ module avail blast blast/2.2.25+ blast/2.2.26 blast/2.2.26+ blast/2. 2.28+ [aorth@hpc: ~]$ module load blast/2.2.28+ [aorth@hpc: ~]$ which blastn /export/apps/blast/2.2.28+/bin/blastn Works anywhere on the cluster!
  • 16. Users and Groups - Consistent UID/GIDs across systems - LDAP + SSSD (also from Red Hat) is a great match - 389 LDAP works great with CentOS - SSSD is simpler than pam_ldap and does caching
  • 17. More information and contact a.orth@cgiar.org http://hpc.ilri.cgiar.org/