SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
High-Performance	and	Scalable	Designs	of	
Programming	Models	for	Exascale	Systems	
Dhabaleswar	K.	(DK)	Panda	
The	Ohio	State	University	
E-mail:	panda@cse.ohio-state.edu	
h<p://www.cse.ohio-state.edu/~panda	
Talk	at	HPCAC-Switzerland	(Mar	2016)	
by
HPCAC-Switzerland	(Mar	‘16)	 2	Network	Based	CompuNng	Laboratory	
High-End	CompuNng	(HEC):	ExaFlop	&	ExaByte	
100-200
PFlops in
2016-2018
1 EFlops in
2020-2024?
F i g u r e 1
Source: IDC's Digital Universe Study, sponsored by EMC, December 2012
Within these broad outlines of the digital universe are some singularities worth noting.
First, while the portion of the digital universe holding potential analytic value is growing, only a tin
fraction of territory has been explored. IDC estimates that by 2020, as much as 33% of the digita
10K-20K
EBytes in
2016-2018
40K EBytes
in 2020 ?
ExaFlop	&	HPC	•  		
ExaByte	&	BigData	• 
HPCAC-Switzerland	(Mar	‘16)	 3	Network	Based	CompuNng	Laboratory	
0	
10	
20	
30	
40	
50	
60	
70	
80	
90	
100	
0	
50	
100	
150	
200	
250	
300	
350	
400	
450	
500	
Percentage	of	Clusters		
Number	of	Clusters	
Timeline	
Percentage	of	Clusters	
Number	of	Clusters	
Trends	for	Commodity	CompuNng	Clusters	in	the	Top	500	List	
(hUp://www.top500.org)	
85%
HPCAC-Switzerland	(Mar	‘16)	 4	Network	Based	CompuNng	Laboratory	
Drivers	of	Modern	HPC	Cluster	Architectures	
Tianhe	–	2		 Titan	 Stampede	 Tianhe	–	1A		
•  MulR-core/many-core	technologies	
•  Remote	Direct	Memory	Access	(RDMA)-enabled	networking	(InfiniBand	and	RoCE)	
•  Solid	State	Drives	(SSDs),	Non-VolaRle	Random-Access	Memory	(NVRAM),	NVMe-SSD	
•  Accelerators	(NVIDIA	GPGPUs	and	Intel	Xeon	Phi)	
	
	
Accelerators	/	Coprocessors		
high	compute	density,	high	
performance/waU	
>1	TFlop	DP	on	a	chip		
High	Performance	Interconnects	-	
InfiniBand	
<1usec	latency,	100Gbps	Bandwidth>	MulN-core	Processors	 SSD,	NVMe-SSD,	NVRAM
HPCAC-Switzerland	(Mar	‘16)	 5	Network	Based	CompuNng	Laboratory	
•  235	IB	Clusters	(47%)	in	the	Nov’	2015	Top500	list		(h<p://www.top500.org)	
•  InstallaRons	in	the	Top	50	(21	systems):	
Large-scale	InfiniBand	InstallaNons	
462,462	cores	(Stampede)	at	TACC	(10th)	 76,032	cores	(Tsubame	2.5)	at	Japan/GSIC	(25th)	
185,344	cores	(Pleiades)	at	NASA/Ames	(13th)	 194,616	cores	(Cascade)	at	PNNL	(27th)	
72,800	cores	Cray	CS-Storm	in	US	(15th)	 76,032	cores	(Makman-2)	at	Saudi	Aramco	(32nd)	
72,800	cores	Cray	CS-Storm	in	US	(16th)	 110,400	cores	(Pangea)	in	France	(33rd)	
265,440	cores	SGI	ICE	at	Tulip	Trading	Australia	(17th)	 37,120	cores	(Lomonosov-2)	at	Russia/MSU	(35th)	
124,200	cores	(Topaz)	SGI	ICE	at	ERDC	DSRC	in	US		(18th)	 57,600	cores	(SwifLucy)	in	US	(37th)	
72,000	cores	(HPC2)	in	Italy	(19th)	 55,728	cores	(Prometheus)	at	Poland/Cyfronet	(38th)	
152,692	cores	(Thunder)	at	AFRL/USA	(21st	)	 50,544	cores	(Occigen)	at	France/GENCI-CINES	(43rd)	
147,456	cores	(SuperMUC)	in		Germany	(22nd)	 76,896	cores	(Salomon)	SGI	ICE	in	Czech	Republic	(47th)	
86,016	cores	(SuperMUC	Phase	2)	in		Germany	(24th)	 and	many	more!
HPCAC-Switzerland	(Mar	‘16)	 6	Network	Based	CompuNng	Laboratory	
•  ScienRfic	CompuRng	
–  Message	Passing	Interface	(MPI),	including	MPI	+	OpenMP,	is	the	Dominant	
Programming	Model		
–  Many	discussions	towards	ParRRoned	Global	Address	Space	(PGAS)		
•  UPC,	OpenSHMEM,	CAF,	etc.	
–  Hybrid	Programming:	MPI	+	PGAS	(OpenSHMEM,	UPC)		
•  Big	Data/Enterprise/Commercial	CompuRng	
–  Focuses	on	large	data	and	data	analysis	
–  Hadoop	(HDFS,	HBase,	MapReduce)		
–  Spark	is	emerging	for	in-memory	compuRng	
–  Memcached	is	also	used	for	Web	2.0		
Two	Major	Categories	of	ApplicaNons
HPCAC-Switzerland	(Mar	‘16)	 7	Network	Based	CompuNng	Laboratory	
Towards	Exascale	System	(Today	and	Target)	
Systems	 2016	
Tianhe-2	
2020-2024	 Difference	
Today	&	Exascale	
System	peak	 55	PFlop/s	 1	EFlop/s	 ~20x	
Power	 18	MW	
(3	Gflops/W)	
~20	MW	
(50	Gflops/W)	
O(1)	
~15x	
System	memory	 1.4	PB	
(1.024PB	CPU	+	0.384PB	CoP)	
32	–	64	PB	 ~50X	
Node	performance	 3.43TF/s	
(0.4	CPU	+	3	CoP)	
1.2	or	15	TF	 O(1)	
Node	concurrency	 24	core	CPU	+		
171	cores	CoP	
O(1k)	or	O(10k)	 ~5x		-	~50x	
Total	node	interconnect	BW	 6.36	GB/s	 200	–	400	GB/s	 ~40x	-~60x	
System	size	(nodes)	 16,000	 O(100,000)	or	O(1M)	 ~6x	-	~60x	
Total	concurrency	 3.12M	
12.48M	threads	(4	/core)	
O(billion)		
	for	latency	hiding	
~100x	
MTTI	 Few/day	 Many/day	 O(?)	
Courtesy:	Prof.	Jack	Dongarra
HPCAC-Switzerland	(Mar	‘16)	 8	Network	Based	CompuNng	Laboratory	
•  Energy	and	Power	Challenge	
–  Hard	to	solve	power	requirements	for	data	movement	
•  Memory	and	Storage	Challenge	
–  Hard	to	achieve	high	capacity	and	high	data	rate	
•  Concurrency	and	Locality	Challenge	
–  Management	of	very	large	amount	of	concurrency	(billion	threads)	
•  Resiliency	Challenge	
–  Low	voltage	devices	(for	low	power)	introduce	more	faults	
Basic	Design	Challenges	for	Exascale	Systems
HPCAC-Switzerland	(Mar	‘16)	 9	Network	Based	CompuNng	Laboratory	
Parallel	Programming	Models	Overview	
P1	 P2	 P3	
Shared	Memory	
P1	 P2	 P3	
Memory	 Memory	 Memory	
P1	 P2	 P3	
Memory	 Memory	 Memory	
Logical	shared	memory	
Shared	Memory	Model	
SHMEM,	DSM	
Distributed	Memory	Model		
MPI	(Message	Passing	Interface)	
ParRRoned	Global	Address	Space	(PGAS)	
Global	Arrays,	UPC,	Chapel,	X10,	CAF,	…	
•  Programming	models	provide	abstract	machine	models	
•  Models	can	be	mapped	on	different	types	of	systems	
–  e.g.	Distributed	Shared	Memory	(DSM),	MPI	within	a	node,	etc.	
•  PGAS	models	and	Hybrid	MPI+PGAS	models	are	gradually	receiving	
importance
HPCAC-Switzerland	(Mar	‘16)	 10	Network	Based	CompuNng	Laboratory	
•  Message	Passing	Library	standardized	by	MPI	Forum	
–  C	and	Fortran	
•  Goal:	portable,	efficient	and	flexible	standard	for	wriRng	parallel	applicaRons	
•  Not	IEEE	or	ISO	standard,	but	widely	considered	“industry	standard”	for	HPC	
applicaRon	
•  EvoluRon	of	MPI	
–  MPI-1:	1994	
–  MPI-2:	1996	
–  MPI-3.0:	2008	–	2012,	standardized	on	September	21,	2012	
–  MPI-3.1:	2012	–	2015,	standardized	on	June	4,	2015	
–  Next	plan	is	for	MPI	4.0	
MPI	Overview	and	History
HPCAC-Switzerland	(Mar	‘16)	 11	Network	Based	CompuNng	Laboratory	
•  Power	required	for	data	movement	operaRons	is	one	of	the	main	challenges	
•  Non-blocking	collecRves	
–  Overlap	computaRon	and	communicaRon	
•  Much	improved	One-sided	interface	
–  Reduce	synchronizaRon	of	sender/receiver	
•  Manage	concurrency	
–  Improved	interoperability	with	PGAS	(e.g.	UPC,	Global	Arrays,	OpenSHMEM,	CAF)	
•  Resiliency	
–  New	interface	for	detecRng	failures	
How	does	MPI	Plan	to	Meet	Exascale	Challenges?
HPCAC-Switzerland	(Mar	‘16)	 12	Network	Based	CompuNng	Laboratory	
•  Major	features	in	MPI	3.0	
–  Non-blocking	CollecRves	
–  Improved	One-Sided	(RMA)	Model	
–  MPI	Tools	Interface	
•  SpecificaRon	is	available	from:	
h<p://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf	
Major	New	Features	in	MPI-3.0
HPCAC-Switzerland	(Mar	‘16)	 13	Network	Based	CompuNng	Laboratory	
MPI-3	RMA	:	One-sided	CommunicaNon	Model	
HCA HCA HCAP 1 P 2 P 3
Write to P2
Write to P3
Write Data from P1
Write data from P2
Post to HCA
Post to HCA
Buffer at P2 Buffer at P3
Global Region Creation (Buffer Info Exchanged)
Buffer at P1
HCA Write
Data to P2
HCA Write
Data to P3
HPCAC-Switzerland	(Mar	‘16)	 14	Network	Based	CompuNng	Laboratory	
•  Non-blocking	one-sided	communicaRon	rouRnes		
–  Put,	Get		(Rput,	Rget)	
–  Accumulate,	Get_accumulate	
–  Atomics	
•  Flexible	synchronizaRon	operaRons	to	control	iniRaRon	and	compleRon	
MPI-3	RMA:	CommunicaNon	and	synchronizaNon		PrimiNves	
MPI	One-sided	SynchronizaNon/CompleNon	PrimiNves		
SynchronizaNon		 CompleNon		 Win_sync	
Lock/	
Unlock	
Lock_all/	
Unlock_all	
Fence	
Post-Wait/	
	Start-Complete	
Flush	
Flush_all	
Flush_local	
Flush_local_all
HPCAC-Switzerland	(Mar	‘16)	 15	Network	Based	CompuNng	Laboratory	
•  Network	adapters	can	provide	
RDMA	feature	that	doesn’t	require	
sofware	involvement	at	remote	
side	
•  As	long	as	puts/gets	are	executed	as	
soon	as	they	are	issued,	overlap	can	
be	achieved	
•  RDMA-based	implementaRons	do	
just	that		
MPI-3	RMA	:	Overlapping	CommunicaNon	and	ComputaNon
HPCAC-Switzerland	(Mar	‘16)	 16	Network	Based	CompuNng	Laboratory	
•  Enables	overlap	of	computaRon	with	communicaRon	
•  Non-blocking	calls	do	not	match	blocking	collecRve	calls	
–  MPI		may	use	different	algorithms	for	blocking	and	non-blocking	collecRves	
–  Blocking	collecRves:	OpRmized	for	latency	
–  Non-blocking	collecRves:	OpRmized	for	overlap	
•  A	process	calling	an	NBC	operaRon	
–  Schedules	collecRve	operaRon	and	immediately	returns	
–  Executes	applicaRon	computaRon	code	
–  Waits	for	the	end	of	the	collecRve		
•  	The	communicaRon	progress	by	
–  ApplicaRon	code	through	MPI_Test	
–  Network	adapter	(HCA)	with	hardware	support	
–  Dedicated	processes	/	thread	in	MPI	library	
•  There	is	a	non-blocking	equivalent	for	each	blocking	operaRon		
–  Has	an	“I”	in	the	name		(MPI_Bcast	->	MPI_Ibcast;	MPI_Reduce		->	MPI_Ireduce)	
MPI-3	Non-blocking	CollecNve	(NBC)	OperaNons
HPCAC-Switzerland	(Mar	‘16)	 17	Network	Based	CompuNng	Laboratory	
MPI	Tools	Interface	
•  Extended	tools	support	in	MPI-3,	beyond	the	PMPI	interface	
•  Provide	standardized	interface	(MPIT)	to	access	MPI	internal	
informaRon	
•  ConfiguraRon	and	control	informaRon	
•  Eager	limit,	buffer	sizes,	.	.	.		
•  Performance	informaRon	
•  Time	spent	in	blocking,	memory	usage,	.	.	.		
•  Debugging	informaRon	
•  Packet	counters,	thresholds,	.	.	.		
•  External	tools	can	build	on	top	of	this	standard	interface
HPCAC-Switzerland	(Mar	‘16)	 18	Network	Based	CompuNng	Laboratory	
•  MPI	3.1	was	approved	on	June	4,	2015	
–  SpecificaRon	is	available	from:	
h<p://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf	
•  Major	features	and	enhancements:	
–  CorrecRon	to	the	Fortran	bindings	introduced	in	MPI-3.0	
–  New	funcRons	added	include	rouRnes	to	manipulate	MPI_Aint	values	in	a	
portable	manner	
–  Nonblocking	collecRve	I/O	rouRnes	
–  RouRnes	to	get	the	index	value	by	name	for	MPI_T	performance	and	
control	variables	
MPI-3.1	Enhancements
HPCAC-Switzerland	(Mar	‘16)	 19	Network	Based	CompuNng	Laboratory	
ParNNoned	Global	Address	Space	(PGAS)	Models	
•  Key	features	
-  Simple	shared	memory	abstracRons		
-  Light	weight	one-sided	communicaRon		
-  Easier	to	express	irregular	communicaRon	
•  Different	approaches	to	PGAS		
-  Languages		
•  Unified	Parallel	C	(UPC)	
•  Co-Array	Fortran	(CAF)	
•  X10	
•  Chapel		
-  Libraries	
•  OpenSHMEM	
•  UPC++	
•  Global	Arrays
HPCAC-Switzerland	(Mar	‘16)	 20	Network	Based	CompuNng	Laboratory	
OpenSHMEM	
•  SHMEM	implementaRons	–	Cray	SHMEM,	SGI	SHMEM,	Quadrics	SHMEM,	HP	SHMEM,	GSHMEM	
•  Subtle	differences	in	API,	across	versions	–	example:		
																													SGI	SHMEM												Quadrics	SHMEM													Cray	SHMEM		
IniNalizaNon								start_pes(0)																		shmem_init 	 	start_pes				
Process	ID														_my_pe																											my_pe																					shmem_my_pe	
•  Made	applicaRon	codes	non-portable		
•  OpenSHMEM	is	an	effort	to	address	this:		
“A	new,	open	specifica>on	to	consolidate	the	various	extant	SHMEM	versions		
into	a	widely	accepted	standard.”	–	OpenSHMEM	Specifica>on	v1.0	
by	University	of	Houston	and	Oak	Ridge	NaRonal	Lab	
SGI	SHMEM	is	the	baseline
HPCAC-Switzerland	(Mar	‘16)	 21	Network	Based	CompuNng	Laboratory	
•  UPC:	Unified	Parallel	C		-	PGAS	based	language	extension	to	C	
–  An	ISO	C99-based	language	providing	uniform	programming	model	for	both	shared	and	distributed	
memory	hardware	to	support	HPC	
–  UPC	=	UPC	translator	+	C	compiler	+	UPC	runRme	
•  	Coarray	Fortran	(CAF):	Language-level	PGAS	support	in	Fortran	
–  An	extension	to	Fortran	to	support	global	shared	array	(coarray)	in	parallel	Fortran	applicaRons	
–  CAF	=	CAF	compiler	+	CAF	runRme	(libcaf)	
–  Basic	support	in	Fortran	2008	and	extended	support	to	collecRve	in	Fortran	2015	
•  UPC++:	An	Object	Oriented	PGAS	Programming	Model	
–  A	compiler-free	PGAS	programming	model	in	context	of	C++	
–  Built	on	top	of	C++	standard	templates	and	runRme	libraries	
–  Extension	to	UPC’s	programming	idioms	
–  Register	task	for	async	execuRon	
UPC,	CAF	and	UPC++
HPCAC-Switzerland	(Mar	‘16)	 22	Network	Based	CompuNng	Laboratory	
•  Hierarchical	architectures	with	mulRple	address	spaces	
•  (MPI	+	PGAS)	Model	
–  MPI	across	address	spaces	
–  PGAS	within	an	address	space	
•  MPI	is	good	at	moving	data	between	address	spaces	
•  Within	an	address	space,	MPI	can	interoperate	with	other	shared	memory	programming	
models		
•  ApplicaRons	can	have	kernels	with	different	communicaRon	pa<erns	
•  Can	benefit	from	different	models	
•  Re-wriRng	complete	applicaRons	can	be	a	huge	effort	
•  Port	criRcal	kernels	to	the	desired	model	instead	
MPI+PGAS	for	Exascale	Architectures	and	ApplicaNons
HPCAC-Switzerland	(Mar	‘16)	 23	Network	Based	CompuNng	Laboratory	
Hybrid	(MPI+PGAS)	Programming	
•  ApplicaRon	sub-kernels	can	be	re-wri<en	in	MPI/PGAS	based	on	communicaRon	
characterisRcs	
•  Benefits:	
–  Best	of	Distributed	CompuRng	Model	
–  Best	of	Shared	Memory	CompuRng	Model	
•  Exascale	Roadmap*:		
–  “Hybrid	Programming	is	a	pracRcal	way	to	
	program	exascale	systems”	
*	The	Interna>onal	Exascale	SoKware	Roadmap,	Dongarra,	J.,	Beckman,	P.	et	al.,	Volume	25,	Number	1,	2011,	
Interna>onal	Journal	of	High	Performance	Computer	Applica>ons,	ISSN	1094-3420	
Kernel	1	
MPI	
Kernel	2	
MPI	
Kernel	3	
MPI	
Kernel	N	
MPI	
HPC	ApplicaNon	
Kernel	2	
PGAS	
Kernel	N	
PGAS
HPCAC-Switzerland	(Mar	‘16)	 24	Network	Based	CompuNng	Laboratory	
Designing	CommunicaNon	Libraries	for	MulN-Petaflop	and	
Exaflop	Systems:	Challenges		
Programming	Models	
MPI,	PGAS	(UPC,	Global	Arrays,	OpenSHMEM),	CUDA,	OpenMP,	
OpenACC,	Cilk,	Hadoop	(MapReduce),	Spark	(RDD,	DAG),	etc.	
ApplicaNon	Kernels/ApplicaNons		
Networking	Technologies	
(InfiniBand,	40/100GigE,		
Aries,	and	OmniPath)	
	MulN/Many-core	
Architectures	
Accelerators	
(NVIDIA	and	MIC)	
Middleware		
Co-Design	
OpportuniNes	
and	
Challenges	
across	Various	
Layers	
	
Performance	
Scalability	
Fault-
Resilience	
CommunicaNon	Library	or	RunNme	for	Programming	Models	
Point-to-point	
CommunicaNon	
CollecNve	
CommunicaNon	
Energy-	
Awareness	
SynchronizaNon	
and	Locks	
I/O	and	
File	Systems	
Fault	
Tolerance
HPCAC-Switzerland	(Mar	‘16)	 25	Network	Based	CompuNng	Laboratory	
•  Scalability	for	million	to	billion	processors	
–  Support	for	highly-efficient	inter-node	and	intra-node	communicaRon	(both	two-sided	and	one-sided)	
–  Scalable	job	start-up	
•  Scalable	CollecRve	communicaRon	
–  Offload	
–  Non-blocking	
–  Topology-aware	
•  Balancing	intra-node	and	inter-node	communicaRon	for	next	generaRon	nodes	(128-1024	cores)	
–  MulRple	end-points	per	node	
•  Support	for	efficient	mulR-threading	
•  Integrated	Support	for	GPGPUs	and	Accelerators	
•  Fault-tolerance/resiliency	
•  QoS	support	for	communicaRon	and	I/O	
•  Support	for	Hybrid	MPI+PGAS	programming	(MPI	+	OpenMP,	MPI	+	UPC,	MPI	+	OpenSHMEM,	
CAF,	…)	
•  VirtualizaRon		
•  Energy-Awareness	
	
Broad	Challenges	in	Designing		CommunicaNon	Libraries	for	(MPI+X)	at	
Exascale
HPCAC-Switzerland	(Mar	‘16)	 26	Network	Based	CompuNng	Laboratory	
•  Extreme	Low	Memory	Footprint	
–  Memory	per	core	conRnues	to	decrease	
	
•  D-L-A	Framework	
–  Discover	
•  Overall	network	topology	(fat-tree,	3D,	…),	Network	topology	for	processes	for	a	given	job	
•  Node	architecture,	Health	of	network	and	node	
–  Learn	
•  Impact	on	performance	and	scalability	
•  PotenRal	for	failure	
–  Adapt	
•  Internal	protocols	and	algorithms	
•  Process	mapping	
•  Fault-tolerance	soluRons		
–  Low	overhead	techniques	while	delivering	performance,	scalability	and	fault-tolerance	
	
AddiNonal	Challenges	for	Designing	Exascale	Soqware	Libraries
HPCAC-Switzerland	(Mar	‘16)	 27	Network	Based	CompuNng	Laboratory	
Overview	of	the	MVAPICH2	Project	
•  High	Performance	open-source	MPI	Library	for	InfiniBand,	10-40Gig/iWARP,	and	RDMA	over	Converged	Enhanced	Ethernet	(RoCE)	
–  MVAPICH	(MPI-1),	MVAPICH2	(MPI-2.2	and	MPI-3.0),	Available	since	2002	
–  MVAPICH2-X	(MPI	+	PGAS),	Available	since	2011	
–  Support	for	GPGPUs		(MVAPICH2-GDR)	and	MIC	(MVAPICH2-MIC),	Available	since	2014	
–  Support	for	VirtualizaRon	(MVAPICH2-Virt),	Available	since	2015	
–  Support	for	Energy-Awareness	(MVAPICH2-EA),	Available	since	2015	
–  Used	by	more	than	2,525	organizaNons	in	77	countries	
–  More	than	356,000	(>	0.36	million)	downloads	from	the	OSU	site	directly	
–  Empowering	many	TOP500	clusters	(Nov	‘15	ranking)	
•  10th	ranked	519,640-core	cluster	(Stampede)	at		TACC	
•  13th	ranked	185,344-core	cluster	(Pleiades)	at	NASA	
•  25th	ranked	76,032-core	cluster	(Tsubame	2.5)	at	Tokyo	InsRtute	of	Technology	and	many	others	
–  Available	with	sofware	stacks	of	many	vendors	and	Linux	Distros	(RedHat	and	SuSE)	
–  h<p://mvapich.cse.ohio-state.edu	
•  Empowering	Top500	systems	for	over	a	decade	
–  System-X	from	Virginia	Tech	(3rd	in	Nov	2003,	2,200	processors,	12.25	TFlops)	->	
–  Stampede	at	TACC	(10th	in	Nov’15,	519,640	cores,	5.168	Plops)
HPCAC-Switzerland	(Mar	‘16)	 28	Network	Based	CompuNng	Laboratory	
MVAPICH2	Architecture	
High	Performance	Parallel	Programming	Models	
Message	Passing	Interface	
(MPI)	
PGAS	
(UPC,	OpenSHMEM,	CAF,	UPC++*)	
Hybrid	---	MPI	+	X	
(MPI	+	PGAS	+	OpenMP/Cilk)	
High	Performance	and	Scalable	CommunicaNon	RunNme	
Diverse	APIs	and	Mechanisms	
Point-to-
point	
PrimiNves	
CollecNves	
Algorithms	
Energy-	
Awareness	
Remote	
Memory	
Access	
I/O	and	
File	Systems	
Fault	
Tolerance	
VirtualizaNon	
AcNve	
Messages	
Job	Startup	
IntrospecNon	
&	Analysis	
Support	for	Modern	Networking	Technology	
(InfiniBand,	iWARP,	RoCE,	OmniPath)	
Support	for	Modern	MulN-/Many-core	Architectures	
(Intel-Xeon,	OpenPower*,	Xeon-Phi	(MIC,	KNL*),	NVIDIA	GPGPU)	
Transport	Protocols	 Modern	Features	
RC	 XRC	 UD	 DC	 UMR	 ODP*	
SR-
IOV	
MulN	
Rail	
Transport	Mechanisms	
Shared	
Memory	
CMA	 IVSHMEM	
Modern	Features	
MCDRAM*	 NVLink*	 CAPI*	
*	Upcoming
HPCAC-Switzerland	(Mar	‘16)	 29	Network	Based	CompuNng	Laboratory	
Timeline
Jan-04
Jan-10
Nov-12
MVAPICH2-X	
OMB	
MVAPICH2	
MVAPICH	
Oct-02
Nov-04
Apr-15
EOL	
MVAPICH2-GDR	
MVAPICH2-MIC	
MVAPICH	Project	Timeline	
Jul-15
MVAPICH2-Virt	
Aug-14
Aug-15
Sep-15
MVAPICH2-EA	
OSU-INAM
HPCAC-Switzerland	(Mar	‘16)	 30	Network	Based	CompuNng	Laboratory	
MVAPICH2	Soqware	Family		
Requirements	 MVAPICH2	Library	to	use	
MPI	with	IB,	iWARP	and	RoCE	 MVAPICH2	
Advanced	MPI,	OSU	INAM,	PGAS	and	MPI+PGAS	with	IB	and	RoCE	 MVAPICH2-X	
MPI	with	IB	&	GPU	 MVAPICH2-GDR	
MPI	with	IB	&	MIC	 MVAPICH2-MIC	
HPC	Cloud	with	MPI	&	IB	 MVAPICH2-Virt	
Energy-aware	MPI	with	IB,	iWARP	and	RoCE	 MVAPICH2-EA
HPCAC-Switzerland	(Mar	‘16)	 31	Network	Based	CompuNng	Laboratory	
0	
50000	
100000	
150000	
200000	
250000	
300000	
350000	
Sep-04	
Jan-05	
May-05	
Sep-05	
Jan-06	
May-06	
Sep-06	
Jan-07	
May-07	
Sep-07	
Jan-08	
May-08	
Sep-08	
Jan-09	
May-09	
Sep-09	
Jan-10	
May-10	
Sep-10	
Jan-11	
May-11	
Sep-11	
Jan-12	
May-12	
Sep-12	
Jan-13	
May-13	
Sep-13	
Jan-14	
May-14	
Sep-14	
Jan-15	
May-15	
Sep-15	
Jan-16	
Number	of	Downloads	
Timeline	
MV	0.9.4	
MV2	0.9.0	
MV2	0.9.8	
MV2	1.0	
MV	1.0	
MV2	1.0.3	
MV	1.1	
MV2	1.4	
MV2	1.5	
MV2	1.6	
MV2	1.7	
MV2	1.8	
MV2	1.9	
MV2	2.1	
MV2-GDR	2.0b	
MV2-MIC	2.0	
MV2-Virt	2.1rc2	
MV2-GDR		2.2b	
MV2-X	2.2b	MV2	2.2b	
MVAPICH/MVAPICH2	Release	Timeline	and	Downloads
HPCAC-Switzerland	(Mar	‘16)	 32	Network	Based	CompuNng	Laboratory	
•  Scalability	for	million	to	billion	processors	
–  Support	for	highly-efficient	inter-node	and	intra-node	communicaRon	(both	two-sided	and	one-sided	
RMA)	
–  Support	for	advanced	IB	mechanisms	(UMR	and	ODP)	
–  Extremely	minimal	memory	footprint	
–  Scalable	job	start-up	
•  CollecRve	communicaRon	
•  Unified	RunRme	for	Hybrid	MPI+PGAS	programming	(MPI	+	OpenSHMEM,	MPI	+	
UPC,	CAF,	…)		
•  InfiniBand	Network	Analysis	and	Monitoring	(INAM)	
•  Integrated	Support	for	GPGPUs	
•  Integrated	Support	for	MICs	
•  VirtualizaRon	(SR-IOV	and	Container)	
•  Energy-Awareness		
Overview	of	A	Few	Challenges	being	Addressed	by	the	MVAPICH2	
Project	for	Exascale
HPCAC-Switzerland	(Mar	‘16)	 33	Network	Based	CompuNng	Laboratory	
One-way	Latency:	MPI	over	IB	with	MVAPICH2	
0.00	
0.20	
0.40	
0.60	
0.80	
1.00	
1.20	
1.40	
1.60	
1.80	
2.00	 Small	Message	Latency	
Message	Size	(bytes)	
Latency	(us)	
1.26	
1.19	
0.95	
1.15	
TrueScale-QDR	-	2.8	GHz	Deca-core	(IvyBridge)	Intel	PCI	Gen3	with	IB	switch	
ConnectX-3-FDR	-	2.8	GHz	Deca-core	(IvyBridge)	Intel	PCI	Gen3	with	IB	switch	
ConnectIB-Dual	FDR	-	2.8	GHz	Deca-core	(IvyBridge)	Intel	PCI	Gen3	with	IB	switch	
ConnectX-4-EDR	-	2.8	GHz	Deca-core	(Haswell)	Intel	PCI	Gen3	Back-to-back	
0	
20	
40	
60	
80	
100	
120	
TrueScale-QDR	
ConnectX-3-FDR	
ConnectIB-DualFDR	
ConnectX-4-EDR	
Large	Message	Latency	
Message	Size	(bytes)	
Latency	(us)
HPCAC-Switzerland	(Mar	‘16)	 34	Network	Based	CompuNng	Laboratory	
Bandwidth:	MPI	over	IB	with	MVAPICH2	
0	
2000	
4000	
6000	
8000	
10000	
12000	
14000	 UnidirecNonal	Bandwidth	
Bandwidth	
(MBytes/sec)	
Message	Size	(bytes)	
12465	
3387	
6356	
12104	
0	
5000	
10000	
15000	
20000	
25000	
30000	
TrueScale-QDR	
ConnectX-3-FDR	
ConnectIB-DualFDR	
ConnectX-4-EDR	
BidirecNonal	Bandwidth	
Bandwidth	
(MBytes/sec)	
Message	Size	(bytes)	
21425	
12161	
24353	
6308	
TrueScale-QDR	-	2.8	GHz	Deca-core	(IvyBridge)	Intel	PCI	Gen3	with	IB	switch	
ConnectX-3-FDR	-	2.8	GHz	Deca-core	(IvyBridge)	Intel	PCI	Gen3	with	IB	switch	
ConnectIB-Dual	FDR	-	2.8	GHz	Deca-core	(IvyBridge)	Intel	PCI	Gen3	with	IB	switch	
ConnectX-4-EDR	-	2.8	GHz	Deca-core	(Haswell)	Intel	PCI	Gen3	Back-to-back
HPCAC-Switzerland	(Mar	‘16)	 35	Network	Based	CompuNng	Laboratory	
0	
0.5	
1	
0	 1	 2	 4	 8	 16	 32	 64	 128	 256	 512	 1K	
Latency	(us)	
Message	Size	(Bytes)	
Latency	
Intra-Socket	 Inter-Socket	
MVAPICH2	Two-Sided	Intra-Node	Performance	
(Shared	memory	and	Kernel-based	Zero-copy	Support	(LiMIC	and	CMA))	
Latest	MVAPICH2	2.2b	
Intel	Ivy-bridge	
0.18	us	
0.45	us	
0	
5000	
10000	
15000	
Bandwidth	(MB/s)	
Message	Size	(Bytes)	
Bandwidth	(Inter-socket)	
inter-Socket-CMA	
inter-Socket-Shmem	
inter-Socket-LiMIC	
0	
5000	
10000	
15000	
Bandwidth	(MB/s)	
Message	Size	(Bytes)	
Bandwidth	(Intra-socket)	
intra-Socket-CMA	
intra-Socket-Shmem	
intra-Socket-LiMIC	
14,250	MB/s	
13,749	MB/s
HPCAC-Switzerland	(Mar	‘16)	 36	Network	Based	CompuNng	Laboratory	
•  Introduced	by	Mellanox	to	support	direct	local	and	remote	nonconRguous	
memory	access	
–  Avoid	packing	at	sender	and	unpacking	at	receiver		
•  Available	with	MVAPICH2-X	2.2b	
User-mode	Memory	RegistraNon	(UMR)	
0	
50	
100	
150	
200	
250	
300	
350	
4K	 16K	 64K	 256K	 1M	
Latency		(us)	
Message	Size	(Bytes)	
Small	&	Medium	Message	Latency	
UMR	
Default	
0	
5000	
10000	
15000	
20000	
2M	 4M	 8M	 16M	
Latency	(us)	
Message	Size	(Bytes)	
Large	Message	Latency	
UMR	
Default	
Connect-IB	(54	Gbps):	2.8	GHz	Dual	Ten-core	(IvyBridge)	Intel	PCI	Gen3	with	Mellanox	IB	FDR	switch	
M.	Li,	H.	Subramoni,	K.	Hamidouche,	X.	Lu	and	D.	K.	Panda,	High	Performance	MPI	Datatype	Support	with	
User-mode	Memory	RegistraNon:	Challenges,	Designs	and	Benefits,	CLUSTER,	2015
HPCAC-Switzerland	(Mar	‘16)	 37	Network	Based	CompuNng	Laboratory	
•  Introduced	by	Mellanox	to	support	direct	remote	memory	access	without	pinning	
•  Memory	regions	paged	in/out	dynamically	by	the	HCA/OS	
•  Size	of	registered	buffers	can	be	larger	than	physical	memory		
•  Will	be	available	in	future	MVAPICH2	release	
On-Demand	Paging	(ODP)	
Connect-IB	(54	Gbps):	2.6	GHz	Dual	Octa-core	(SandyBridge)	Intel	PCI	Gen3	with	Mellanox	IB	FDR	switch	
0	
500	
1000	
1500	
16	 32	 64	
Pin-down	Buffer	Size	
(MB)	
Number	of	Processes	
Graph500	Pin-down	Buffer	Sizes	
Pin-down	 ODP	
0	
1	
2	
3	
4	
5	
16	 32	 64	
ExecuNon	Time	(s)	
Number	of	Processes	
Graph500	BFS	Kernel	
Pin-down	 ODP
HPCAC-Switzerland	(Mar	‘16)	 38	Network	Based	CompuNng	Laboratory	
Minimizing	Memory	Footprint	by	Direct	Connect	(DC)	Transport	
Node	
0	
P1	
P0	 Node	1	
P3	
P2	
Node	3	
P7	
P6	
Node	
2	
P5	
P4	IB	
Network	
•  Constant	connecRon	cost	(One	QP	for	any	peer)	
•  Full	Feature	Set	(RDMA,	Atomics	etc)	
•  Separate	objects	for	send	(DC	IniRator)	and	receive	(DC	Target)	
–  DC	Target	idenRfied	by	“DCT	Number”	
–  Messages	routed	with	(DCT	Number,	LID)	
–  Requires	same	“DC	Key”	to	enable	communicaRon	
•  Available	since	MVAPICH2-X	2.2a		
0	
0.5	
1	
160	 320	 620	
Normalized	ExecuNon	
Time	
Number	of	Processes	
NAMD	-	Apoa1:	Large	data	set	
RC	 DC-Pool	 UD	 XRC	
10	
22	
47	
97	
1	 1	 1	
2	
10	 10	 10	 10	
1	 1	
3	
5	
1	
10	
100	
80	 160	 320	 640	
ConnecNon	Memory	(KB)	
Number	of	Processes	
Memory	Footprint	for	Alltoall	
RC	 DC-Pool	 UD	 XRC	
H.	Subramoni,	K.	Hamidouche,	A.	Venkatesh,	S.	Chakraborty	and	D.	K.	Panda,	Designing	MPI	Library	with	Dynamic	Connected	Transport	(DCT)	
of	InfiniBand	:	Early	Experiences.	IEEE	InternaRonal	SupercompuRng	Conference	(ISC	’14)
HPCAC-Switzerland	(Mar	‘16)	 39	Network	Based	CompuNng	Laboratory	
•  Near-constant	MPI	and	OpenSHMEM	
iniRalizaRon	Rme	at	any	process	count	
•  10x	and	30x	improvement	in	startup	Rme	
of		MPI	and	OpenSHMEM	respecRvely	at	
16,384	processes	
•  Memory	consumpRon	reduced	for	
remote	endpoint	informaRon	by	
O(processes	per	node)	
•  1GB	Memory	saved	per	node	with	1M	
processes	and	16	processes	per	node	
Towards	High	Performance	and	Scalable	Startup	at	Exascale	
P M
O
Job	Startup	Performance	
Memory	Required	to	Store	
Endpoint	InformaRon	
P
M
PGAS	–	State	of	the	art	
MPI	–	State	of	the	art	
O PGAS/MPI	–	OpRmized	
PMIX_Ring	
PMIX_Ibarrier	
PMIX_Iallgather	
Shmem	based	PMI	
On-demand		
ConnecRon	
								On-demand	ConnecNon	Management	for	OpenSHMEM	and	OpenSHMEM+MPI.		S.	Chakraborty,	H.	Subramoni,	J.	Perkins,	A.	A.	Awan,	and	D	K	
Panda,	20th	InternaRonal	Workshop	on	High-level	Parallel	Programming	Models	and	SupporRve	Environments	(HIPS	’15)	
								PMI	Extensions	for	Scalable	MPI	Startup.	S.	Chakraborty,	H.	Subramoni,	A.	Moody,	J.	Perkins,	M.	Arnold,	and	D	K	Panda,	Proceedings	of	the	21st	
European	MPI	Users'	Group	MeeRng	(EuroMPI/Asia	’14)	
															Non-blocking	PMI	Extensions	for	Fast	MPI	Startup.	S.	Chakraborty,	H.	Subramoni,	A.	Moody,	A.	Venkatesh,	J.	Perkins,	and	D	K	Panda,	15th	IEEE/
ACM	InternaRonal	Symposium	on	Cluster,	Cloud	and	Grid	CompuRng	(CCGrid	’15)	
								SHMEMPMI	–	Shared	Memory	based	PMI	for	Improved	Performance	and	Scalability.	S.	Chakraborty,	H.	Subramoni,	J.	Perkins,	and	D	K	Panda,	16th	
IEEE/ACM	InternaRonal	Symposium	on	Cluster,	Cloud	and	Grid	CompuRng	(CCGrid	’16)	,	Accepted	for	Publica=on
HPCAC-Switzerland	(Mar	‘16)	 40	Network	Based	CompuNng	Laboratory	
•  SHMEMPMI	allows	MPI	processes	to	directly	read	remote	endpoint	(EP)	informaRon	from	the	process	
manager	through	shared	memory	segments	
•  Only	a	single	copy	per	node	-		O(processes	per	node)	reducRon	in	memory	usage		
•  EsRmated	savings	of	1GB	per	node	with	1	million	processes	and	16	processes	per	node	
•  Up	to	1,000	Rmes	faster	PMI	Gets	compared	to	default	design.	Will	be	available	in	MVAPICH2	2.2RC1.	
Process	Management	Interface	over	Shared	Memory	(SHMEMPMI)	
TACC	Stampede	-	Connect-IB	(54	Gbps):	2.6	GHz	Quad	Octa-core	(SandyBridge)	Intel	PCI	Gen3	with	Mellanox	IB	FDR	
SHMEMPMI	–	Shared	Memory	Based	PMI	for	Performance	and	Scalability	S.	Chakraborty,	H.	Subramoni,	J.	Perkins,	and	D.K.	Panda,	
16th	IEEE/ACM	InternaRonal	Symposium	on	Cluster,	Cloud	and	Grid	CompuRng	(CCGrid	‘16),	Accepted	for	publica=on	
0	
50	
100	
150	
200	
250	
300	
1	 2	 4	 8	 16	 32	
Time	Taken	(milliseconds)	
Number	of	Processes	per	Node	
Time	Taken	by	one	PMI_Get	
Default	
SHMEMPMI	
0.0001	
0.001	
0.01	
0.1	
1	
10	
100	
1000	
10000	
16	 64	 256	 1K	 4K	 16K	 64K	 256K	 1M	
Memory	Usage	per	Node	(MB)	
Number	of	Processes	per	Job	
Memory	Usage	for	Remote	EP	InformaRon	
Fence	-	Default	
Allgather	-	Default	
Fence	-	Shmem	
Allgather	-	Shmem	
EsNmated	
1000x	
Actual	
16x
HPCAC-Switzerland	(Mar	‘16)	 41	Network	Based	CompuNng	Laboratory	
•  Scalability	for	million	to	billion	processors	
•  CollecRve	communicaRon	
–  Offload	and	Non-blocking	
–  Topology-aware	
•  Unified	RunRme	for	Hybrid	MPI+PGAS	programming	(MPI	+	OpenSHMEM,	
MPI	+	UPC,	CAF,	…)	
•  InfiniBand	Network	Analysis	and	Monitoring	(INAM)	
	
		
	
Overview	of	A	Few	Challenges	being	Addressed	by	the	MVAPICH2	
Project	for	Exascale
HPCAC-Switzerland	(Mar	‘16)	 42	Network	Based	CompuNng	Laboratory	
Modified		HPL	with	Offload-Bcast	does	up	to	4.5%	be<er	than	default	
version	(512	Processes)	
0	
1	
2	
3	
4	
5	
512	 600	 720	 800	
ApplicaNon	Run-Time	
(s)	
Data	Size	
0	
5	
10	
15	
64	 128	 256	 512	
Run-Time	(s)	
Number	of	Processes	
PCG-Default	 Modified-PCG-Offload	
Co-Design	with	MPI-3	Non-Blocking	CollecNves	and	CollecNve	Offload	Co-Direct	
Hardware	(Available	since	MVAPICH2-X	2.2a)	
Modified		P3DFFT	with	Offload-Alltoall	does	up	to	17%	be<er	than	
default	version	(128	Processes)	
K.	Kandalla,	et.	al..	High-Performance	and	Scalable	Non-Blocking	All-to-All	
with	CollecNve	Offload	on	InfiniBand	Clusters:	A	Study	with	Parallel	3D	FFT,	
ISC	2011	
17%	
0	
0.2	
0.4	
0.6	
0.8	
1	
1.2	
10	 20	 30	 40	 50	 60	 70	
Normalized		
Performance		
HPL-Offload	 HPL-1ring	 HPL-Host	
HPL	Problem	Size	(N)	as	%	of	Total	Memory	
	
4.5%	
Modified		Pre-Conjugate	Gradient	Solver	with	Offload-Allreduce	
does	up	to	21.8%	be<er	than	default	version	
K.	Kandalla,	et.	al,	Designing	Non-blocking	Broadcast	with	CollecNve	Offload	on	
InfiniBand	Clusters:	A	Case	Study	with	HPL,	HotI	2011	
K.	Kandalla,	et.	al.,	Designing	Non-blocking	Allreduce	with	CollecNve	Offload	on	
InfiniBand	Clusters:	A	Case	Study	with	Conjugate	Gradient	Solvers,	IPDPS	’12	
21.8%	
Can	Network-Offload	based	Non-Blocking	Neighborhood	MPI	CollecNves	
Improve	CommunicaNon	Overheads	of	Irregular	Graph		Algorithms?	K.	Kandalla,	
A.	Buluc,	H.	Subramoni,	K.	Tomko,	J.	Vienne,	L.	Oliker,	and	D.	K.	Panda,	IWPAPS’	
12
HPCAC-Switzerland	(Mar	‘16)	 43	Network	Based	CompuNng	Laboratory	
Network-Topology-Aware	Placement	of	Processes	
•  Can	we	design	a	highly	scalable	network	topology	detecRon	service	for	IB?	
	
•  How	do	we	design	the	MPI	communicaRon	library	in	a	network-topology-aware	manner	to	efficiently	leverage	the	topology	
informaRon	generated	by	our	service?	
	
•  What	are	the	potenRal	benefits	of	using	a	network-topology-aware	MPI	library	on	the	performance	of	parallel	scienRfic	applicaRons?	
Overall	performance	and	Split	up	of	physical	communicaNon	for	MILC	on	Ranger	
Performance	for	varying	
system	sizes	 Default	for	2048	core	run	 Topo-Aware	for	2048	core	run	
15%	
H.	Subramoni,	S.	Potluri,	K.	Kandalla,	B.	Barth,	J.	Vienne,	J.	Keasler,	K.	Tomko,	K.	Schulz,	A.	Moody,	and	D.	K.	Panda,	Design	of	a	Scalable	InfiniBand	
Topology	Service	to	Enable	Network-Topology-Aware	Placement	of	Processes,	SC'12	.	BEST		Paper	and	BEST	STUDENT	Paper	Finalist	
• 		Reduce	network	topology	discovery	Nme	from	O(N2
hosts)	to	O(Nhosts)	
• 		15%	improvement	in	MILC	execuNon	Nme	@	2048	cores	
• 		15%	improvement	in	Hypre	execuNon	Nme	@	1024	cores
HPCAC-Switzerland	(Mar	‘16)	 44	Network	Based	CompuNng	Laboratory	
•  Scalability	for	million	to	billion	processors	
•  CollecRve	communicaRon	
•  Unified	RunRme	for	Hybrid	MPI+PGAS	programming	(MPI	+	OpenSHMEM,	
MPI	+	UPC,	CAF,	…)	
•  InfiniBand	Network	Analysis	and	Monitoring	(INAM)	
	
		
	
Overview	of	A	Few	Challenges	being	Addressed	by	the	MVAPICH2	
Project	for	Exascale
HPCAC-Switzerland	(Mar	‘16)	 45	Network	Based	CompuNng	Laboratory	
MVAPICH2-X	for	Advanced	MPI	and	Hybrid	MPI	+	PGAS	ApplicaNons	
MPI, OpenSHMEM, UPC, CAF, UPC++ or Hybrid (MPI +
PGAS) Applications
Unified MVAPICH2-X Runtime
InfiniBand, RoCE, iWARP
OpenSHMEM Calls MPI CallsUPC Calls
•  Unified	communicaRon	runRme	for	MPI,	UPC,	OpenSHMEM,	CAF,	UPC++	available	with	MVAPICH2-
X	1.9	onwards!		(since	2012)	
•  UPC++	support	will	be	available	in	upcoming	MVAPICH2-X	2.2RC1	
•  Feature	Highlights	
–  Supports	MPI(+OpenMP),	OpenSHMEM,	UPC,	CAF,	UPC++,	MPI(+OpenMP)	+	OpenSHMEM,	MPI(+OpenMP)	
+	UPC		
–  MPI-3	compliant,	OpenSHMEM	v1.0	standard	compliant,	UPC	v1.2	standard	compliant	(with	iniRal	support	
for	UPC	1.3),	CAF	2008	standard	(OpenUH),	UPC++	
–  Scalable	Inter-node	and	intra-node	communicaRon	–	point-to-point	and	collecRves	
CAF Calls UPC++ Calls
HPCAC-Switzerland	(Mar	‘16)	 46	Network	Based	CompuNng	Laboratory	
ApplicaNon	Level	Performance	with	Graph500	and	Sort	Graph500	ExecuNon	Time	
J.	Jose,	S.	Potluri,	K.	Tomko	and	D.	K.	Panda,	Designing	Scalable	Graph500	Benchmark	with	Hybrid	MPI+OpenSHMEM	Programming	Models,	
InternaNonal	SupercompuNng	Conference	(ISC’13),	June	2013	
J.	Jose,	K.	Kandalla,	M.	Luo	and	D.	K.	Panda,	SupporNng	Hybrid	MPI	and	OpenSHMEM	over	InfiniBand:	Design	and	Performance	EvaluaNon,	
Int'l	Conference	on	Parallel	Processing	(ICPP	'12),	September	2012	
0	
5	
10	
15	
20	
25	
30	
35	
4K	 8K	 16K	
Time	(s)	
No.	of	Processes	
MPI-Simple	
MPI-CSC	
MPI-CSR	
Hybrid	(MPI+OpenSHMEM)	
13X	
7.6X	
•  Performance	of	Hybrid	(MPI+	OpenSHMEM)	Graph500	Design	
•  8,192	processes	
	-	2.4X	improvement	over	MPI-CSR	
	-	7.6X	improvement	over	MPI-Simple	
•  16,384	processes	
	-	1.5X	improvement	over	MPI-CSR	
	-	13X	improvement	over	MPI-Simple	
	
J.	Jose,	K.	Kandalla,	S.	Potluri,	J.	Zhang	and	D.	K.	Panda,	OpNmizing	CollecNve	CommunicaNon	in	OpenSHMEM,	Int'l	Conference	on	ParNNoned	
Global	Address	Space	Programming	Models	(PGAS	'13),	October	2013.	
Sort	ExecuNon	Time	
0	
1000	
2000	
3000	
500GB-512	 1TB-1K	 2TB-2K	 4TB-4K	
Time	(seconds)	
Input	Data	-	No.	of	Processes	
MPI	 Hybrid	
51%	
•  Performance	of	Hybrid	(MPI+OpenSHMEM)	Sort	
ApplicaRon	
•  4,096	processes,	4	TB	Input	Size	
	-	MPI	–	2408	sec;	0.16	TB/min	
	-	Hybrid	–	1172	sec;	0.36	TB/min	
	-	51%	improvement	over	MPI-design
HPCAC-Switzerland	(Mar	‘16)	 47	Network	Based	CompuNng	Laboratory	
MiniMD	–	Total	ExecuNon	Time	
•  Hybrid	design	performs	be<er	than	MPI	implementaRon	
•  1,024	processes	
-  17%	improvement	over	MPI	version	
•  Strong	Scaling	
Input	size:	128	*	128	*	128	
Performance	 Strong	Scaling	
0	
500	
1000	
1500	
2000	
2500	
512	 1,024	
Hybrid-Barrier	 MPI-Original	 Hybrid-Advanced	
17%	
0	
500	
1000	
1500	
2000	
2500	
3000	
256	 512	 1,024	
Hybrid-Barrier	 MPI-Original	 Hybrid-Advanced	
Time	(ms)	
Time	(ms)	
#	of	Cores	 #	of	Cores	
M.	Li,	J.	Lin,	X.	Lu,	K.	Hamidouche,	K.	Tomko	and	D.	K.	Panda,	Scalable	MiniMD	Design	with	Hybrid	MPI	and	OpenSHMEM,	OpenSHMEM	User	Group	
MeeNng	(OUG	’14),	held	in	conjuncNon	with	8th	InternaNonal	Conference	on	ParNNoned	Global	Address	Space	Programming	Models,	(PGAS	14).
HPCAC-Switzerland	(Mar	‘16)	 48	Network	Based	CompuNng	Laboratory	
Hybrid	MPI+UPC	NAS-FT	
•  Modified	NAS	FT	UPC	all-to-all	pa<ern	using	MPI_Alltoall	
•  Truly	hybrid	program	
•  For	FT	(Class	C,	128	processes)		
•  	34%	improvement	over	UPC-GASNet	
•  	30%	improvement	over	UPC-OSU	
	
0	
5	
10	
15	
20	
25	
30	
35	
B-64	 C-64	 B-128	 C-128	
Time	(s)	
NAS	Problem	Size	–	System	Size	
UPC-GASNet	
UPC-OSU	
Hybrid-OSU	
34%	
J.	Jose,	M.	Luo,	S.	Sur	and	D.	K.	Panda,	Unifying	UPC	and	MPI	RunNmes:	Experience	with	MVAPICH,	Fourth	Conference	on	
ParNNoned	Global	Address	Space	Programming	Model	(PGAS	’10),	October	2010	
Hybrid	MPI	+	UPC	Support	
Available	since	
MVAPICH2-X	1.9	(2012)
HPCAC-Switzerland	(Mar	‘16)	 49	Network	Based	CompuNng	Laboratory	
•  Scalability	for	million	to	billion	processors	
•  CollecRve	communicaRon	
•  Unified	RunRme	for	Hybrid	MPI+PGAS	programming	(MPI	+	OpenSHMEM,	
MPI	+	UPC,	CAF,	…)	
•  InfiniBand	Network	Analysis	and	Monitoring	(INAM)	
	
		
	
Overview	of	A	Few	Challenges	being	Addressed	by	the	MVAPICH2	
Project	for	Exascale
HPCAC-Switzerland	(Mar	‘16)	 50	Network	Based	CompuNng	Laboratory	
Overview	of	OSU	INAM	
•  A	network	monitoring	and	analysis	tool	that	is	capable	of	analyzing	traffic	on	the	InfiniBand	network	
with	inputs	from	the	MPI	runRme	
–  h<p://mvapich.cse.ohio-state.edu/tools/osu-inam/	
–  h<p://mvapich.cse.ohio-state.edu/userguide/osu-inam/	
•  Monitors	IB	clusters	in	real	Rme	by	querying	various	subnet	management	enRRes	and	gathering	
input	from	the	MPI	runRmes	
•  Capability	to	analyze	and	profile	node-level,	job-level	and	process-level	acRviRes	for	MPI	
communicaRon	(Point-to-Point,	CollecRves	and	RMA)	
•  Ability	to	filter	data	based	on	type	of	counters	using	“drop	down”	list	
•  Remotely	monitor	various	metrics	of	MPI	processes	at	user	specified	granularity	
•  "Job	Page"	to	display	jobs	in	ascending/descending	order	of	various	performance	metrics	in	
conjuncRon	with	MVAPICH2-X	
•  Visualize	the	data	transfer	happening	in	a	“live”	or	“historical”	fashion	for	enRre	network,	job	or	set	
of	nodes
HPCAC-Switzerland	(Mar	‘16)	 51	Network	Based	CompuNng	Laboratory	
OSU	INAM	–	Network	Level	View		
•  Show	network	topology	of	large	clusters	
•  Visualize	traffic	pa<ern	on	different	links	
•  Quickly	idenRfy	congested	links/links	in	error	state	
•  See	the	history	unfold	–	play	back	historical	state	of	the	network	
Full	Network	(152	nodes)	 Zoomed-in	View	of	the	Network
HPCAC-Switzerland	(Mar	‘16)	 52	Network	Based	CompuNng	Laboratory	
OSU	INAM	–	Job	and	Node	Level	Views	
Visualizing	a	Job	(5	Nodes)	 Finding	Routes	Between	Nodes	
•  Job	level	view	
•  Show	different	network	metrics	(load,	error,	etc.)	for	any	live	job	
•  Play	back	historical	data	for	completed	jobs	to	idenRfy	bo<lenecks	
•  Node	level	view	provides	details	per	process	or	per	node	
•  CPU	uRlizaRon	for	each	rank/node	
•  Bytes	sent/received	for	MPI	operaRons	(pt-to-pt,	collecRve,	RMA)	
•  Network	metrics	(e.g.	XmitDiscard,	RcvError)	per	rank/node
HPCAC-Switzerland	(Mar	‘16)	 53	Network	Based	CompuNng	Laboratory	
Live	Node	Level	View
HPCAC-Switzerland	(Mar	‘16)	 54	Network	Based	CompuNng	Laboratory	
Live	Switch	Level	View
HPCAC-Switzerland	(Mar	‘16)	 55	Network	Based	CompuNng	Laboratory	
List	of	Supported	Switch	Counters	
•  The	following	counters	are	queried	from	the	InfiniBand	Switches	
•  Xmit	Data	
–  Total	number	of	data	octets,	divided	by	4,	transmi<ed	on	all	VLs	from	the	port	
–  This	includes	all	octets	between	(and	not	including)	the	start	of	packet	delimiter	and	the	VCRC,	and	
may	include	packets	containing	errors	
–  Excludes	all	link	packets.	
•  Rcv	Data	
–  Total	number	of	data	octets,	divided	by	4,	received	on	all	VLs	from	the	port	
–  This	includes	all	octets	between	(and	not	including)	the	start	of	packet	delimiter	and	the	VCRC,	and	
may	include	packets	containing	errors	
–  Excludes	all	link	packets.	
•  Max	[Xmit	Data/Rcv	Data]:	Maximum	of	the	two	values	above
HPCAC-Switzerland	(Mar	‘16)	 56	Network	Based	CompuNng	Laboratory	
List	of	Supported	MPI	Process	Level	Counters	
•  MVAPICH2-X	collects	addiRonal	informaRon	about	the	process’s	network	usage	which	can	be	displayed	by	OSU	
INAM	
•  Xmit	Data	
–  Total	number	of	bytes	transmi<ed	as	part	of	the	MPI	applicaRon	
•  Rcv	Data	
–  Total	number	of	bytes	received	as	part	of	the	MPI	applicaRon	
•  Max	[Xmit	Data/Rcv	Data]	
–  Maximum	of	the	two	values	above	
•  Point	to	Point	Send	
–  Total	number	of	bytes	transmi<ed	as	part	of	MPI	point-to-point	operaRons	
•  Point	to	Point	Rcvd	
–  Total	number	of	bytes	received	as	part	of	MPI	point-to-point	operaRons	
•  Max	[Point	to	Point	Sent/Rcvd]	
–  Maximum	of	the	two	values	above	
•  Coll	Bytes	Sent	
–  Total	number	of	bytes	transmi<ed	as	part	of	MPI	collecRve	operaRons	
•  Coll	Bytes	Rcvd	
–  Total	number	of	bytes	received	as	part	of	MPI	collecRve	operaRons
HPCAC-Switzerland	(Mar	‘16)	 57	Network	Based	CompuNng	Laboratory	
List	of	Supported	MPI	Process	Level	Counters	(Cont.)	
•  Max	[Coll	Bytes	Sent/Rcvd]	
–  Maximum	of	the	two	values	above	
•  RMA	Bytes	Sent	
–  Total	number	of	bytes	transmi<ed	as	part	of	MPI	RMA	operaRons	
–  Note	that	due	to	the	nature	of	the	RMA	operaRons,	bytes	received	for	RMA	operaRons	cannot	be	counted	
•  RC	VBUF	
–  The	number	of	internal	communicaRon	buffers	used	for	reliable	connecRon	(RC)	
•  UD	VBUF	
–  The	number	of	internal	communicaRon	buffers	used	for	unreliable	datagram	(UD)	
•  VM	Size	
–  Total	number	of	bytes	used	by	the	program	for	its	virtual	memory	
•  VM	Peak	
–  Maximum	number	of	virtual	memory	bytes	for	the	program	
•  VM	RSS	
–  The	number	of	bytes	resident	in	the	memory	(Resident	set	size)	
•  VM	HWM	
–  The	maximum	number	of	bytes	that	can	be	resident	in	memory	(Peak	resident	set	size	or	High	water	mark)
HPCAC-Switzerland	(Mar	‘16)	 58	Network	Based	CompuNng	Laboratory	
List	of	Supported	Network	Error	Counters	(Cont.)	
•  XmtDiscards	
–  Total	number	of	outbound	packets	discarded	by	the	port	because	the	port	is	down	or	congested.	Reasons	for	this	include:	
•  Output	port	is	not	in	the	acRve	state	
•  Packet	length	exceeded	NeighborMTU	
•  Switch	LifeRme	Limit	exceeded	
•  Switch	HOQ	LifeRme	Limit	exceeded	This	may	also	include	packets	discarded	while	in	VLStalled	State.	
•  XmtConstraintErrors	
–  Total	number	of	packets	not	transmi<ed	from	the	switch	physical	port	for	the	following	reasons:	
•  FilterRawOutbound	is	true	and	packet	is	raw	
•  ParRRonEnforcementOutbound	is	true	and	packet	fails	parRRon	key	check	or	IP	version	check	
•  RcvConstraintErrors	
–  Total	number	of	packets	not	received	from	the	switch	physical	port	for	the	following	reasons:	
•  FilterRawInbound	is	true	and	packet	is	raw	
•  ParRRonEnforcementInbound	is	true	and	packet	fails	parRRon	key	check	or	IP	version	check	
•  LinkIntegrityErrors	
–  The	number	of	Rme	s	that	the	count	of	local	physical	errors	exceeded	the	threshold	specified	by	LocalPhyErrors	
•  ExcBufOverrunErrors	
–  The	number	of	Rmes	that	OverrunErrors	consecuRve	flow	control	update	periods	occurred,	each	having	at	least	one	overrun	error	
•  VL15Dropped:	Number	of	incoming	VL15	packets	dropped	due	to	resource	limitaRons	(e.g.,	lack	of	buffers)	in	the	port
HPCAC-Switzerland	(Mar	‘16)	 59	Network	Based	CompuNng	Laboratory	
List	of	Supported	Network	Error	Counters	
•  The	following	error	counters	are	available	both	at	switch	and	process	level:	
•  SymbolErrors	
–  Total	number	of	minor	link	errors	detected	on	one	or	more	physical	lanes	
•  LinkRecovers	
–  Total	number	of	Rmes	the	Port	Training	state	machine	has	successfully	completed	the	link	error	recovery	process	
•  LinkDowned	
–  Total	number	of	Rmes	the	Port	Training	state	machine	has	failed	the	link	error	recovery	process	and	downed	the	link	
•  RcvErrors	
–  Total	number	of	packets	containing	an	error	that	were	received	on	the	port.	These	errors	include:	
•  Local	physical	errors	
•  Malformed	data	packet	errors	
•  Malformed	link	packet	errors	
•  Packets	discarded	due	to	buffer	overrun	
•  RcvRemotePhysErrors	
–  Total	number	of	packets	marked	with	the	EBP	delimiter	received	on	the	port.	
•  RcvSwitchRelayErrors	
–  Total	number	of	packets	received	on	the	port	that	were	discarded	because	they	could	not	be	forwarded	by	the	switch	relay
HPCAC-Switzerland	(Mar	‘16)	 60	Network	Based	CompuNng	Laboratory	
Conclusions	
	
•  Provided	an	overview	of	programming	models	for	
exascale	systems	
•  Outlined	the	associated	challenges	in	designing	
runRmes	for	the	programming	models	challenges	
•  Demonstrated	how	MVAPICH2	project	is	addressing	
some	of	these	challenges
HPCAC-Switzerland	(Mar	‘16)	 61	Network	Based	CompuNng	Laboratory	
•  Integrated	Support	for	GPGPUs	
•  Integrated	Support	for	MICs	
•  VirtualizaRon	(SR-IOV	and	Container)	
•  Energy-Awareness		
•  Best	PracRce:	Set	of	Tunings	for	Common	ApplicaRons	
(Available	through	the	MVAPICH	Website)	
	
	
AddiNonal	Challenges	to	be	Covered	in	Today’s	1:30pm	Talk
HPCAC-Switzerland	(Mar	‘16)	 62	Network	Based	CompuNng	Laboratory	
panda@cse.ohio-state.edu	
Thank	You!	
The	High-Performance	Big	Data	Project	
h<p://hibd.cse.ohio-state.edu/
Network-Based	CompuRng	Laboratory	
h<p://nowlab.cse.ohio-state.edu/
The	MVAPICH2	Project	
h<p://mvapich.cse.ohio-state.edu/

Contenu connexe

Tendances

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
Increasing Cluster Performance by Combining rCUDA with Slurm
Increasing Cluster Performance by Combining rCUDA with SlurmIncreasing Cluster Performance by Combining rCUDA with Slurm
Increasing Cluster Performance by Combining rCUDA with Slurminside-BigData.com
 
Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologiesinside-BigData.com
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutionsinside-BigData.com
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
Overview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future RoadmapOverview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future Roadmapinside-BigData.com
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systemsinside-BigData.com
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsGanesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputinginside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Challenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIChallenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIinside-BigData.com
 

Tendances (20)

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
Increasing Cluster Performance by Combining rCUDA with Slurm
Increasing Cluster Performance by Combining rCUDA with SlurmIncreasing Cluster Performance by Combining rCUDA with Slurm
Increasing Cluster Performance by Combining rCUDA with Slurm
 
Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Overview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future RoadmapOverview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future Roadmap
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systems
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Challenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIChallenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPI
 

Similaire à Programming Models for Exascale Systems

Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale SystemsDesigning Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systemsinside-BigData.com
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Datainside-BigData.com
 
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...inside-BigData.com
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systemsinside-BigData.com
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future PlansMVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
Addressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC RuntimesAddressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC Runtimesinside-BigData.com
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?inside-BigData.com
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systemsinside-BigData.com
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Aheadinside-BigData.com
 
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) Wim Vanderbauwhede
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan
 

Similaire à Programming Models for Exascale Systems (20)

Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale SystemsDesigning Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Data
 
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future PlansMVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Addressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC RuntimesAddressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC Runtimes
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
 
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 

Plus de inside-BigData.com

Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architecturesinside-BigData.com
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computinginside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computing
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 

Dernier

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Dernier (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Programming Models for Exascale Systems