SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Clemson	
  	
  
HPC	
  Storage	
  
Dell	
  Panel	
  SC13	
  
	
  
Boyd	
  Wilson	
  
So,ware	
  CTO	
  
Clemson	
  University	
  
	
  
	
  
Outline	
  
•  Palme9o	
  Cluster	
  
•  Wide	
  Area	
  Storage	
  Across	
  the	
  Innova@on	
  
PlaAorm	
  
•  Collec@ve	
  Cluster	
  	
  
(Real-­‐Time	
  Data	
  Aggrega@on	
  and	
  Analy@cs	
  Cluster)	
  	
  
•  Performance	
  Numbers	
  
•  Research	
  DMZ/Network	
  	
  
Palmetto	
  Storage	
  
Primary	
  Research	
  Cluster	
  at	
  Clemson	
  
•  1972	
  nodes	
  
•  22928	
  Cores	
  
•  998400	
  Cuda	
  Cores	
  
•  396	
  TF	
  (only	
  benchmarked	
  newest	
  GPU	
  nodes)	
  
•  ~120	
  +	
  TF	
  addi@onal	
  not	
  benchmarked.	
  
•  Condominium	
  Model	
  
•  Home	
  Storage	
  SAMQFS	
  backed	
  by	
  SL8500	
  (6PB)	
  
•  Scratch	
  OrangeFS	
  
Palmetto	
  Storage	
  

MX	
  Nodes	
  
1622	
  Nodes	
  
96	
  TF	
  

10G	
  MX	
  

96	
  IB	
  Nodes	
  
with	
  	
  

FDR	
  IB	
  Nodes	
  
200	
  Nodes	
  
400	
  Nvidia	
  K20	
  	
  
396	
  TF	
  

FDR	
  IB	
  

10G	
  Eth	
  

Scratch	
  
•  32	
  R510	
  
•  16	
  R720	
  
•  512TB	
  OrangeFS	
  
(v2.8.8)	
  

Home/Archive	
  
•  SAMQFS	
  over	
  NFS	
  
•  120TB	
  Disk	
  
•  6PB	
  Tape	
  

NFS	
  

	
  
SAM	
  QFS	
  Home	
  
and	
  Archive	
  on	
  
SL8500	
  
Palmetto	
  Scratch	
  	
  
Next	
  Steps	
  

MX	
  Nodes	
  
1622	
  Nodes	
  
96	
  TF	
  

10G	
  IPoMX	
  

Mul@ple	
  10G	
  Eth	
  
WebDAV	
  

	
  
Campus	
  Data	
  Access	
  

• 
• 
• 
• 
• 

32	
  Dell	
  R720	
  
520TB	
  Scratch	
  
OrangeFS	
  
WebDAV	
  to	
  OrangeFS	
  
Hadoop	
  over	
  OrangeFS	
  with	
  
MyHadoop	
  

FDR	
  IPoIB	
  

FDR	
  IB	
  Nodes	
  
200	
  Nodes	
  
400	
  Nvidia	
  K20	
  
GPU	
  
396	
  TF	
  

Mul@ple	
  10G	
  Eth	
  /	
  100	
  G	
  
ScienceDMZ	
  
	
  
Innova@on	
  PlaAorm	
  
Data	
  Access	
  
Clemson	
  –	
  USC	
  100Gb	
  tests	
  

12	
  Dell	
  R720	
  
OrangeFS	
  
Servers	
  

OrangeFS	
  
Clients	
  

•  File	
  Write	
  37Gb/s	
  
•  Server	
  Hw	
  problems	
  &	
  network	
  packet	
  loss	
  during	
  tests	
  
•  Perfsonar	
  49Gb/s	
  ini@al	
  
•  Later	
  retest	
  ~70Gb/s	
  with	
  tuning	
  
•  Addi@onal	
  File	
  tes@ng	
  planned	
  
(Ini@al	
  tes@ng	
  systems	
  had	
  to	
  move	
  to	
  produc@on)	
  
SC13	
  Demo	
  

OrangeFS	
  
Clients	
  
OrangeFS	
  
Clients	
  

SC13	
  Floor	
  
•  Clemson	
  
•  USC	
  
•  I2	
  
•  Omnibond	
  

16	
  Dell	
  R720	
  
OrangeFS	
  
Servers	
  
The	
  “Collective”	
  Cluster	
  

Palme9o	
  

•  12	
  R720	
  
•  170TB	
  
•  D3	
  based	
  Vis	
  Toolkit	
  
called	
  SocalTap	
  
•  Social	
  Media	
  
Aggrega@on	
  Via	
  GNIP	
  
•  Elas@c	
  Search	
  
•  Hadoop	
  MapReduce	
  
•  OrangeFS	
  
•  WebDAV	
  to	
  OrangeFS	
  

Mul@ple	
  10G	
  Eth	
  
	
   WebDAV	
  
Campus	
  Data	
  Access	
  
Social	
  Data	
  Input	
  

ScienceDMZ	
  
	
  
Innova@on	
  PlaAorm	
  
Data	
  Access	
  
OrangeFS	
  on	
  Dell	
  R720s	
  

•  16	
  Dell	
  R720	
  Servers	
  Connected	
  with	
  10Gb/s	
  Ethernet	
  
•  32	
  Clients	
  reached	
  nearly	
  12GB/s	
  read	
  and	
  8GB/s	
  write	
  
#	
  Write	
  
iozone	
  -­‐i	
  0	
  -­‐c	
  -­‐e	
  -­‐w	
  -­‐r	
  $RS	
  -­‐s	
  4g	
  -­‐t	
  $NUM_PROCESSES	
  -­‐+n	
  -­‐+m	
  $CLIENT_LIST	
  
#	
  Read	
  
iozone	
  -­‐i	
  1	
  -­‐c	
  -­‐e	
  -­‐w	
  -­‐r	
  $RS	
  -­‐s	
  4g	
  -­‐t	
  $NUM_PROCESSES	
  -­‐+n	
  -­‐+m	
  $CLIENT_LIST	
  
MapReduce	
  over	
  OrangeFS	
  
• 

*25%	
  improvement	
  with	
  OrangeFS	
  running	
  on	
  Separate	
  nodes	
  
from	
  Map	
  Reduce	
  	
  

•  8	
  Dell	
  R720	
  Servers	
  Connected	
  with	
  10Gb/s	
  Ethernet	
  
•  Remote	
  Case	
  adds	
  an	
  additional	
  8	
  Identical	
  Servers	
  and	
  
does	
  all	
  OrangeFS	
  work	
  Remotely	
  and	
  only	
  Local	
  work	
  is	
  
done	
  on	
  Compute	
  Node	
  (Traditional	
  HPC	
  Model)	
  
MapReduce	
  over	
  OrangeFS	
  

•  16	
  Dell	
  R720	
  Servers	
  Connected	
  with	
  10Gb/s	
  Ethernet	
  
•  Remote	
  Clients	
  are	
  Dell	
  R720s	
  with	
  single	
  SAS	
  disks	
  for	
  
local	
  data	
  (vs.	
  12	
  disk	
  arrays	
  in	
  the	
  previous	
  test).	
  
Clemson	
  Research	
  Network	
  
Internet/I2/NLR
PerfSonar
PerfSonar

Collaborator PerfSonar

CLight

Science(DMZ(
Perimeter&F/W

I2&InnovaJon&PlaKorm

Internet

F/W&(ACL)&and&Route&Filter

DMZ

Campus

gg
ed
&

Peer&Link

Clemson

10
0G
i

g&T
a

PerfSonar

Tr
un

k

PerfSonar

Innova@on(
PlaAorm
Palme>oNet
Host&Firewall

Brocade(MLx32(
Core((Router

CC7NIE
Fibre(Channel

Dell&Z9000

SamQFS

Dell&S4810

Top&of&Rack

Dell&S4810
Dell&S4810

Contenu connexe

Tendances

Tendances (20)

Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
An intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data WorkshopAn intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data Workshop
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
Ceph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage WeilCeph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage Weil
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere Mortals
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFS
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
NantOmics
NantOmicsNantOmics
NantOmics
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
 

En vedette

En vedette (11)

Implications of Salesforce Acquiring ExactTarget for Sales Professionals
Implications of Salesforce Acquiring ExactTarget for Sales ProfessionalsImplications of Salesforce Acquiring ExactTarget for Sales Professionals
Implications of Salesforce Acquiring ExactTarget for Sales Professionals
 
ISC Student Cluster Team Summary
ISC Student Cluster Team Summary ISC Student Cluster Team Summary
ISC Student Cluster Team Summary
 
Univa License Management Podcast slides
Univa License Management Podcast slidesUniva License Management Podcast slides
Univa License Management Podcast slides
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflows
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Overview of Spark for HPC
Overview of Spark for HPCOverview of Spark for HPC
Overview of Spark for HPC
 
Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similaire à Clemson: Solving the HPC Data Deluge

High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSDHigh-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
inside-BigData.com
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databases
ahl0003
 

Similaire à Clemson: Solving the HPC Data Deluge (20)

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSDHigh-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databases
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Oracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and HowOracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and How
 
QNAP NAS Training 2016
QNAP NAS Training 2016QNAP NAS Training 2016
QNAP NAS Training 2016
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloudLAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
 
QNAP NAS training 2016 Q3
QNAP NAS training 2016 Q3QNAP NAS training 2016 Q3
QNAP NAS training 2016 Q3
 
Docker and coreos20141020b
Docker and coreos20141020bDocker and coreos20141020b
Docker and coreos20141020b
 

Plus de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Clemson: Solving the HPC Data Deluge

  • 1. Clemson     HPC  Storage   Dell  Panel  SC13     Boyd  Wilson   So,ware  CTO   Clemson  University      
  • 2. Outline   •  Palme9o  Cluster   •  Wide  Area  Storage  Across  the  Innova@on   PlaAorm   •  Collec@ve  Cluster     (Real-­‐Time  Data  Aggrega@on  and  Analy@cs  Cluster)     •  Performance  Numbers   •  Research  DMZ/Network    
  • 3. Palmetto  Storage   Primary  Research  Cluster  at  Clemson   •  1972  nodes   •  22928  Cores   •  998400  Cuda  Cores   •  396  TF  (only  benchmarked  newest  GPU  nodes)   •  ~120  +  TF  addi@onal  not  benchmarked.   •  Condominium  Model   •  Home  Storage  SAMQFS  backed  by  SL8500  (6PB)   •  Scratch  OrangeFS  
  • 4. Palmetto  Storage   MX  Nodes   1622  Nodes   96  TF   10G  MX   96  IB  Nodes   with     FDR  IB  Nodes   200  Nodes   400  Nvidia  K20     396  TF   FDR  IB   10G  Eth   Scratch   •  32  R510   •  16  R720   •  512TB  OrangeFS   (v2.8.8)   Home/Archive   •  SAMQFS  over  NFS   •  120TB  Disk   •  6PB  Tape   NFS     SAM  QFS  Home   and  Archive  on   SL8500  
  • 5. Palmetto  Scratch     Next  Steps   MX  Nodes   1622  Nodes   96  TF   10G  IPoMX   Mul@ple  10G  Eth   WebDAV     Campus  Data  Access   •  •  •  •  •  32  Dell  R720   520TB  Scratch   OrangeFS   WebDAV  to  OrangeFS   Hadoop  over  OrangeFS  with   MyHadoop   FDR  IPoIB   FDR  IB  Nodes   200  Nodes   400  Nvidia  K20   GPU   396  TF   Mul@ple  10G  Eth  /  100  G   ScienceDMZ     Innova@on  PlaAorm   Data  Access  
  • 6. Clemson  –  USC  100Gb  tests   12  Dell  R720   OrangeFS   Servers   OrangeFS   Clients   •  File  Write  37Gb/s   •  Server  Hw  problems  &  network  packet  loss  during  tests   •  Perfsonar  49Gb/s  ini@al   •  Later  retest  ~70Gb/s  with  tuning   •  Addi@onal  File  tes@ng  planned   (Ini@al  tes@ng  systems  had  to  move  to  produc@on)  
  • 7. SC13  Demo   OrangeFS   Clients   OrangeFS   Clients   SC13  Floor   •  Clemson   •  USC   •  I2   •  Omnibond   16  Dell  R720   OrangeFS   Servers  
  • 8. The  “Collective”  Cluster   Palme9o   •  12  R720   •  170TB   •  D3  based  Vis  Toolkit   called  SocalTap   •  Social  Media   Aggrega@on  Via  GNIP   •  Elas@c  Search   •  Hadoop  MapReduce   •  OrangeFS   •  WebDAV  to  OrangeFS   Mul@ple  10G  Eth     WebDAV   Campus  Data  Access   Social  Data  Input   ScienceDMZ     Innova@on  PlaAorm   Data  Access  
  • 9. OrangeFS  on  Dell  R720s   •  16  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet   •  32  Clients  reached  nearly  12GB/s  read  and  8GB/s  write   #  Write   iozone  -­‐i  0  -­‐c  -­‐e  -­‐w  -­‐r  $RS  -­‐s  4g  -­‐t  $NUM_PROCESSES  -­‐+n  -­‐+m  $CLIENT_LIST   #  Read   iozone  -­‐i  1  -­‐c  -­‐e  -­‐w  -­‐r  $RS  -­‐s  4g  -­‐t  $NUM_PROCESSES  -­‐+n  -­‐+m  $CLIENT_LIST  
  • 10. MapReduce  over  OrangeFS   •  *25%  improvement  with  OrangeFS  running  on  Separate  nodes   from  Map  Reduce     •  8  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet   •  Remote  Case  adds  an  additional  8  Identical  Servers  and   does  all  OrangeFS  work  Remotely  and  only  Local  work  is   done  on  Compute  Node  (Traditional  HPC  Model)  
  • 11. MapReduce  over  OrangeFS   •  16  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet   •  Remote  Clients  are  Dell  R720s  with  single  SAS  disks  for   local  data  (vs.  12  disk  arrays  in  the  previous  test).  
  • 12. Clemson  Research  Network   Internet/I2/NLR PerfSonar PerfSonar Collaborator PerfSonar CLight Science(DMZ( Perimeter&F/W I2&InnovaJon&PlaKorm Internet F/W&(ACL)&and&Route&Filter DMZ Campus gg ed & Peer&Link Clemson 10 0G i g&T a PerfSonar Tr un k PerfSonar Innova@on( PlaAorm Palme>oNet Host&Firewall Brocade(MLx32( Core((Router CC7NIE Fibre(Channel Dell&Z9000 SamQFS Dell&S4810 Top&of&Rack Dell&S4810 Dell&S4810