SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Supercomputing: The Next 10 Years

Marc	
  Snir	
  
Argonne	
  Na.onal	
  Laboratory	
  &	
  
University	
  of	
  Illinois	
  at	
  Urbana-­‐Champaign	
  
Past
Those	
  who	
  cannot	
  remember	
  the	
  past	
  are	
  condemned	
  to	
  
repeat	
  it	
  (Santayana)	
  

MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

2	
  
The Last Great Extinction
The	
  aJack	
  of	
  the	
  killer	
  micros	
  
ShiL	
  from	
  bipolar	
  vector	
  processor	
  to	
  clusters	
  of	
  MOS	
  microprocessors	
  
Core	
  Count	
  of	
  leading	
  Top500	
  System	
  
10000000	
  
1000000	
  
100000	
  
10000	
  
1000	
  
100	
  
10	
  
1	
  

MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

3	
  
1990: The Attack of the Killer Micros

(Eugene Brooks, 1990)

§  Bipolar	
  technology	
  had	
  hit	
  a	
  power	
  wall	
  (nitrogen	
  cooling)	
  
§  Alterna.ve	
  materials	
  were	
  too	
  expensive	
  /not	
  ready	
  (gallium	
  arsenide)	
  
§  An	
  alterna.ve	
  “good	
  enough”	
  technology	
  was	
  ready	
  
–  MOS	
  microprocessors	
  had	
  been	
  around	
  	
  20	
  years	
  and	
  were	
  a	
  fast	
  growing	
  
market	
  
–  MOS	
  had	
  a	
  clear	
  evolu.on	
  path	
  (“Moore’s	
  Law”)	
  

§  MOS	
  was	
  no	
  beJer	
  than	
  bipolar	
  (in	
  1991)	
  
Cray	
  C90	
  	
  
•  244	
  MHz	
  	
  
•  Vector	
  	
  
•  Vector	
  registers	
  	
  
•  16	
  shared-­‐memory	
  nodes	
  

CM5	
  	
  
•  32	
  MHz	
  
•  Scalar	
  
•  Cache	
  
•  1024	
  message-­‐
passing	
  nodes	
  

§  New	
  paradigm	
  took	
  a	
  while	
  to	
  establish	
  itself	
  (CM1,	
  CM2,	
  KSR…)	
  
§  Change	
  in	
  technology	
  led	
  to	
  change	
  in	
  vendors	
  and	
  business	
  model	
  
§  Technology	
  shiL	
  required	
  a	
  long	
  and	
  painful	
  process	
  of	
  code	
  rewrite	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

4	
  
Present
The	
  past	
  no	
  longer	
  is	
  and	
  the	
  future	
  is	
  not	
  yet	
  (St.	
  Augus.ne)	
  

MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

5	
  
20 Years of (Near) Stability
§  One	
  dominant	
  programming	
  model:	
  Message-­‐Passing	
  (MPI)	
  
§  One	
  major	
  shiL	
  –	
  from	
  single	
  core	
  to	
  mul.core	
  
–  Easy	
  since	
  one	
  can	
  treat	
  each	
  core	
  as	
  a	
  node	
  

10000000	
  
1000000	
  
100000	
  
10000	
  
1000	
  
100	
  
10	
  

mul.core	
  

1	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

6	
  
Increasing Instability
§  Heterogeneous	
  memory:	
  NUMA,	
  noncoherent	
  shared	
  memory,	
  
scratchpads…	
  
§  Heterogenous	
  processing:	
  GPUs,	
  accelerators,	
  big-­‐small	
  cores	
  	
  
(NVIDIA,	
  Xeon	
  Phi,	
  ARM	
  big.LITTLE))	
  
§  Hybrid	
  Memory	
  Cube	
  &	
  near-­‐memory	
  processing	
  
§  No	
  standard	
  programming	
  model	
  
10000000	
  
1000000	
  
100000	
  
10000	
  
1000	
  

accelerators	
  

100	
  
10	
  

mul.core	
  

1	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

7	
  
On Our Way to the Next Extinction?
§  History	
  repeats	
  itself:	
  
–  CMOS	
  technology	
  has	
  hit	
  a	
  power	
  wall	
  	
  
•  Clock	
  speed	
  is	
  not	
  raising	
  
–  Alterna.ve	
  materials	
  are	
  (too)	
  expensive	
  /not	
  ready	
  (gallium	
  
arsenide	
  and	
  other	
  III-­‐V	
  materials;	
  nanowires,	
  nanotubes)	
  

While	
  power	
  consump0on	
  is	
  an	
  urgent	
  challenge,	
  its	
  leakage	
  or	
  
sta0c	
  component	
  will	
  become	
  a	
  major	
  industry	
  crisis	
  in	
  the	
  long	
  
term,	
  threatening	
  the	
  survival	
  of	
  CMOS	
  technology	
  itself,	
  just	
  as	
  
bipolar	
  technology	
  was	
  threatened	
  and	
  eventually	
  disposed	
  of	
  
decades	
  ago	
  (ITRS	
  2011)	
  
§  History	
  does	
  not	
  repeat	
  itself:	
  
–  There	
  is	
  a	
  much	
  larger	
  industrial	
  base	
  
–  An	
  alterna.ve	
  “good	
  enough”	
  technology	
  IS	
  NOT	
  ready	
  
–  There	
  is	
  much	
  more	
  code	
  that	
  needs	
  to	
  be	
  rewriJen	
  if	
  new	
  model	
  is	
  
needed	
  (>200MLOCs)	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

8	
  
Future
It	
  is	
  difficult	
  to	
  make	
  predic.ons,	
  especially	
  about	
  the	
  future	
  (Yogi	
  Berra)	
  
	
  

MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

9	
  
The End of Moore’s Law is Coming
§  Moore’s	
  Law:	
  The	
  
number	
  of	
  transistors	
  
per	
  chip	
  doubles	
  every	
  
two/three	
  years	
  
§  Stein’s	
  Law:	
  If	
  
something	
  cannot	
  go	
  
forever,	
  it	
  will	
  stop	
  
§  Ques.on	
  is	
  not	
  
whether	
  but	
  when	
  will	
  
Moore’s	
  Law	
  stop	
  

MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

10	
  
The 7nm Wall

(courtesy	
  J.	
  Aldun)	
  
ANL-­‐LBNL-­‐ORNL-­‐PNNL	
  	
  

19	
  November	
  2013	
  

11	
  
The End of the Road (?)

§  Quantum	
  tunneling	
  becomes	
  a	
  major	
  obstacle	
  as	
  devices	
  shrinks	
  
–  7-­‐5nm	
  feature	
  size	
  has	
  long	
  been	
  predicted	
  to	
  be	
  the	
  lower	
  limit	
  for	
  
CMOS	
  devices	
  

•  ITRS	
  predicts	
  7.5nm	
  will	
  be	
  reached	
  in	
  2024	
  

§  7.5nm	
  ~	
  30	
  atoms	
  of	
  silicon	
  
–  No	
  much	
  room	
  for	
  further	
  miniaturiza0on,	
  independent	
  of	
  
technology!	
  
–  Room	
  for	
  clock	
  increase	
  (new	
  materials,	
  quantum	
  effect	
  gates,	
  
cryogenic	
  devices…)	
  
	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

12	
  
The Last Mile is the Most Expensive Mile

§  New	
  technologies	
  are	
  needed	
  
–  New	
  materials	
  (e.g.,	
  III-­‐V,	
  germanium	
  thin	
  channels,	
  nanowires,	
  nanotubes	
  
or	
  graphene)	
  	
  
–  New	
  structures	
  (e.g.,	
  3D	
  transistor	
  structures)	
  	
  
–  New	
  packages	
  (e.g.,	
  HMC,	
  photonics)	
  
–  New	
  lithography	
  
–  Control	
  or	
  tolerance	
  of	
  large	
  variances	
  (safety	
  margins,	
  resilience,	
  aging)	
  

§  New	
  technologies	
  are	
  expensive	
  
–  NRE	
  increases	
  faster	
  than	
  profits	
  –	
  forces	
  consolida.on	
  
–  Only	
  two	
  companies	
  can	
  sustain	
  the	
  investments	
  needed	
  to	
  go	
  below	
  22nm	
  
(Intel	
  and	
  Samsung)	
  	
  [Heck,	
  Kaza,	
  Pinner]	
  

§  Less	
  compe..on	
  &	
  larger	
  investments	
  =	
  slower	
  progress	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

13	
  
The Future Is Not What It Was

(courtesy	
  J.	
  Aldun)	
  
ANL-­‐LBNL-­‐ORNL-­‐PNNL	
  	
  

19	
  November	
  2013	
  

14	
  
The Path of Least Resistance – Other than Moore
§  Industry	
  goal	
  is	
  not	
  increased	
  performance;	
  it	
  is	
  increased	
  
ROI.	
  Industry	
  will	
  increasingly	
  invest	
  in	
  alterna.ves	
  as	
  
increasing	
  performance	
  becomes	
  more	
  expensive	
  
–  Low	
  power,	
  low	
  cost	
  
–  New	
  markets:	
  MEMS,	
  sensors	
  

–  System	
  on	
  a	
  chip	
  (smartphone,	
  tablet)	
  

✗  Fewer	
  good	
  commodity	
  building	
  blocks	
  for	
  HPC	
  
–  No	
  low-­‐power/high-­‐flops/high-­‐resilience	
  CPU	
  

✔ More	
  opportuni.es	
  for	
  semi-­‐custom	
  and	
  integra.on	
  of	
  
mul.ple	
  vendor	
  IP	
  on	
  a	
  chip	
  
§  New	
  business	
  model	
  for	
  supercompu.ng?	
  
–  Semi-­‐custom	
  &	
  system	
  on	
  a	
  chip	
  integrator	
  
	
  
Exascale

MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

16	
  
Identified Issues
§  Scale	
  (billion	
  threads)	
  
§  Power	
  (10’s	
  of	
  MWaJs)	
  
–  Communica<on:	
  >	
  99%	
  of	
  power	
  is	
  consumed	
  by	
  moving	
  
operands	
  across	
  the	
  memory	
  hierarchy	
  and	
  across	
  nodes	
  
–  Reduced	
  memory	
  size:	
  (communica.on	
  in	
  .me)	
  
§  Resilience:	
  Something	
  fails	
  every	
  hour;	
  the	
  machine	
  is	
  never	
  
“whole”	
  
–  Trade-­‐off	
  between	
  power	
  and	
  resilience	
  
§  Asynchrony:	
  Equal	
  work	
  ≠	
  equal	
  .me	
  
–  Power	
  management	
  
–  Error	
  recovery	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

17	
  
My Main Concerns
§  Uncertainly	
  about	
  underlying	
  HW	
  architecture	
  
–  Slower	
  progress	
  of	
  IC	
  will	
  necessitate	
  faster	
  progress	
  of	
  architecture	
  
–  May	
  not	
  converge	
  to	
  a	
  new,	
  stable	
  model	
  
–  It	
  is	
  not	
  about	
  por.ng	
  applica.ons	
  to	
  a	
  new	
  programming	
  model	
  –	
  it	
  
is	
  about	
  designing	
  applica.ons	
  for	
  portability	
  
§  Increased	
  soFware	
  complexity	
  
–  Simula.ons	
  of	
  complex	
  systems	
  +	
  uncertainty	
  quan.fica.on	
  +	
  
op.miza.on…	
  
–  Support	
  of	
  complex	
  workflows	
  (e.g.,	
  in	
  situ	
  analysis)	
  
–  SoLware	
  management	
  of	
  power	
  and	
  failures	
  
–  Heterogeneity	
  
–  Scale	
  and	
  .ght	
  coupling	
  (tail	
  of	
  distribu.on	
  maJers!)	
  
–  Hypothesis:	
  soLware	
  will	
  con.nue	
  to	
  be	
  dominant	
  cause	
  of	
  failures	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

18	
  
Conclusion
§  Moore’s	
  Law	
  is	
  slowing	
  down;	
  the	
  slow-­‐down	
  has	
  many	
  
fundamental	
  consequences	
  –	
  only	
  a	
  few	
  of	
  them	
  explored	
  in	
  this	
  
talk	
  
§  HPC	
  is	
  the	
  “canary	
  in	
  the	
  mine”:	
  
–  issues	
  appear	
  earlier	
  because	
  of	
  size	
  and	
  .ght	
  coupling	
  
§  Op.mis.c	
  view	
  of	
  the	
  next	
  decades:	
  no	
  stasis.	
  	
  
–  A	
  frenzy	
  of	
  innova.on	
  to	
  con.nue	
  pushing	
  current	
  ecosystem,	
  
followed	
  by	
  frenzy	
  of	
  innova.on	
  to	
  use	
  totally	
  different	
  
compute	
  technologies	
  
§  Pessimis.c	
  view:	
  	
  The	
  end	
  is	
  coming	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

19	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

20	
  
Backup

MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

21	
  
Do We Care?
§  It’s	
  all	
  about	
  Big	
  Data	
  Now,	
  simula.ons	
  are	
  passé.	
  
§  B***t	
  
§  All	
  science	
  is	
  either	
  physics	
  or	
  stamp	
  collec0ng.	
  (Ernest	
  
Rutherford)	
  
–  In	
  Physical	
  Sciences,	
  experiments	
  and	
  observa.ons	
  exist	
  to	
  
validate/refute/mo.vate	
  theory.	
  “Data	
  Mining”	
  not	
  driven	
  by	
  a	
  
scien.fic	
  hypothesis	
  is	
  “stamp	
  collec.on”.	
  
§  Simula.on	
  is	
  needed	
  to	
  go	
  from	
  a	
  mathema.cal	
  model	
  to	
  
predic.ons	
  on	
  observa.ons.	
  
–  If	
  system	
  is	
  complex	
  (e.g.,	
  climate)	
  then	
  simula.on	
  is	
  expensive	
  
–  OLen,	
  models	
  are	
  stochas.c	
  and	
  predic.ons	
  are	
  sta.s.cal	
  –	
  
complica.ng	
  both	
  simula.on	
  and	
  data	
  analysis	
  
	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  

November	
  13	
  

22	
  
Observation Meets Data: Cosmology
Record-­‐breaking	
  applica.on:	
  3.6	
  Trillion	
  
Computation Meets
par.cles,	
  14	
  Pflop/s	
  

Data: The Argonnealman	
  Habib)	
  
(courtesy	
  S View

Supercomputer
Simulation Campaign

Mapping the Sky with
Survey Instruments

HACC=Hardware/Hybrid Accelerated
Cosmology Code(s)

LSST

‘Cosmic
Calibration’

HACC+CCF (Domain
science+CS+Math+Stats
+Machine learning)

LSST Weak Lensing
w = -1
w = - 0.9

‘Precision
Oracle’
Emulator based on Gaussian
Process Interpolation in HighDimensional Spaces

CCF= Cosmic Calibration Framework

Markov chain
Monte Carlo

Observations:
Statistical error bars
will ‘disappear’ soon!

Contenu connexe

Similaire à Keynote snir sc

Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
BigDataEverywhere
 
“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...
“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...
“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...
Edge AI and Vision Alliance
 
Iedm 2012 techprogram
Iedm 2012 techprogramIedm 2012 techprogram
Iedm 2012 techprogram
hquynh
 
Nikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptxNikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptx
NikhilRajput88
 
CMOS VLSI design
CMOS VLSI designCMOS VLSI design
CMOS VLSI design
Rajan Kumar
 

Similaire à Keynote snir sc (20)

How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Technology trends Moore’s law
Technology trends Moore’s lawTechnology trends Moore’s law
Technology trends Moore’s law
 
01_AdvSemDev_Trends_Volk_EN.pdf
01_AdvSemDev_Trends_Volk_EN.pdf01_AdvSemDev_Trends_Volk_EN.pdf
01_AdvSemDev_Trends_Volk_EN.pdf
 
“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...
“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...
“A New Golden Age for Computer Architecture: Processor Innovation to Enable U...
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Iedm 2012 techprogram
Iedm 2012 techprogramIedm 2012 techprogram
Iedm 2012 techprogram
 
Nikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptxNikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptx
 
marketing s.pptx
marketing s.pptxmarketing s.pptx
marketing s.pptx
 
CMOS VLSI design
CMOS VLSI designCMOS VLSI design
CMOS VLSI design
 
Ic Technology
Ic Technology Ic Technology
Ic Technology
 
IC Technology
IC Technology IC Technology
IC Technology
 
VLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technologyVLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technology
 
Vlsi
VlsiVlsi
Vlsi
 
Sol linux cmg-t_1_1.pptx
Sol linux cmg-t_1_1.pptxSol linux cmg-t_1_1.pptx
Sol linux cmg-t_1_1.pptx
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer Architecture
 
842 manobianco
842 manobianco842 manobianco
842 manobianco
 
MRAM & Its Applications
MRAM & Its ApplicationsMRAM & Its Applications
MRAM & Its Applications
 
Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8
 
IS 139 Lecture 1
IS 139 Lecture 1IS 139 Lecture 1
IS 139 Lecture 1
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Keynote snir sc

  • 1. Supercomputing: The Next 10 Years Marc  Snir   Argonne  Na.onal  Laboratory  &   University  of  Illinois  at  Urbana-­‐Champaign  
  • 2. Past Those  who  cannot  remember  the  past  are  condemned  to   repeat  it  (Santayana)   MCS    -­‐-­‐  Marc  Snir   November  13   2  
  • 3. The Last Great Extinction The  aJack  of  the  killer  micros   ShiL  from  bipolar  vector  processor  to  clusters  of  MOS  microprocessors   Core  Count  of  leading  Top500  System   10000000   1000000   100000   10000   1000   100   10   1   MCS    -­‐-­‐  Marc  Snir   November  13   3  
  • 4. 1990: The Attack of the Killer Micros (Eugene Brooks, 1990) §  Bipolar  technology  had  hit  a  power  wall  (nitrogen  cooling)   §  Alterna.ve  materials  were  too  expensive  /not  ready  (gallium  arsenide)   §  An  alterna.ve  “good  enough”  technology  was  ready   –  MOS  microprocessors  had  been  around    20  years  and  were  a  fast  growing   market   –  MOS  had  a  clear  evolu.on  path  (“Moore’s  Law”)   §  MOS  was  no  beJer  than  bipolar  (in  1991)   Cray  C90     •  244  MHz     •  Vector     •  Vector  registers     •  16  shared-­‐memory  nodes   CM5     •  32  MHz   •  Scalar   •  Cache   •  1024  message-­‐ passing  nodes   §  New  paradigm  took  a  while  to  establish  itself  (CM1,  CM2,  KSR…)   §  Change  in  technology  led  to  change  in  vendors  and  business  model   §  Technology  shiL  required  a  long  and  painful  process  of  code  rewrite   MCS    -­‐-­‐  Marc  Snir   November  13   4  
  • 5. Present The  past  no  longer  is  and  the  future  is  not  yet  (St.  Augus.ne)   MCS    -­‐-­‐  Marc  Snir   November  13   5  
  • 6. 20 Years of (Near) Stability §  One  dominant  programming  model:  Message-­‐Passing  (MPI)   §  One  major  shiL  –  from  single  core  to  mul.core   –  Easy  since  one  can  treat  each  core  as  a  node   10000000   1000000   100000   10000   1000   100   10   mul.core   1   MCS    -­‐-­‐  Marc  Snir   November  13   6  
  • 7. Increasing Instability §  Heterogeneous  memory:  NUMA,  noncoherent  shared  memory,   scratchpads…   §  Heterogenous  processing:  GPUs,  accelerators,  big-­‐small  cores     (NVIDIA,  Xeon  Phi,  ARM  big.LITTLE))   §  Hybrid  Memory  Cube  &  near-­‐memory  processing   §  No  standard  programming  model   10000000   1000000   100000   10000   1000   accelerators   100   10   mul.core   1   MCS    -­‐-­‐  Marc  Snir   November  13   7  
  • 8. On Our Way to the Next Extinction? §  History  repeats  itself:   –  CMOS  technology  has  hit  a  power  wall     •  Clock  speed  is  not  raising   –  Alterna.ve  materials  are  (too)  expensive  /not  ready  (gallium   arsenide  and  other  III-­‐V  materials;  nanowires,  nanotubes)   While  power  consump0on  is  an  urgent  challenge,  its  leakage  or   sta0c  component  will  become  a  major  industry  crisis  in  the  long   term,  threatening  the  survival  of  CMOS  technology  itself,  just  as   bipolar  technology  was  threatened  and  eventually  disposed  of   decades  ago  (ITRS  2011)   §  History  does  not  repeat  itself:   –  There  is  a  much  larger  industrial  base   –  An  alterna.ve  “good  enough”  technology  IS  NOT  ready   –  There  is  much  more  code  that  needs  to  be  rewriJen  if  new  model  is   needed  (>200MLOCs)   MCS    -­‐-­‐  Marc  Snir   November  13   8  
  • 9. Future It  is  difficult  to  make  predic.ons,  especially  about  the  future  (Yogi  Berra)     MCS    -­‐-­‐  Marc  Snir   November  13   9  
  • 10. The End of Moore’s Law is Coming §  Moore’s  Law:  The   number  of  transistors   per  chip  doubles  every   two/three  years   §  Stein’s  Law:  If   something  cannot  go   forever,  it  will  stop   §  Ques.on  is  not   whether  but  when  will   Moore’s  Law  stop   MCS    -­‐-­‐  Marc  Snir   November  13   10  
  • 11. The 7nm Wall (courtesy  J.  Aldun)   ANL-­‐LBNL-­‐ORNL-­‐PNNL     19  November  2013   11  
  • 12. The End of the Road (?) §  Quantum  tunneling  becomes  a  major  obstacle  as  devices  shrinks   –  7-­‐5nm  feature  size  has  long  been  predicted  to  be  the  lower  limit  for   CMOS  devices   •  ITRS  predicts  7.5nm  will  be  reached  in  2024   §  7.5nm  ~  30  atoms  of  silicon   –  No  much  room  for  further  miniaturiza0on,  independent  of   technology!   –  Room  for  clock  increase  (new  materials,  quantum  effect  gates,   cryogenic  devices…)     MCS    -­‐-­‐  Marc  Snir   November  13   12  
  • 13. The Last Mile is the Most Expensive Mile §  New  technologies  are  needed   –  New  materials  (e.g.,  III-­‐V,  germanium  thin  channels,  nanowires,  nanotubes   or  graphene)     –  New  structures  (e.g.,  3D  transistor  structures)     –  New  packages  (e.g.,  HMC,  photonics)   –  New  lithography   –  Control  or  tolerance  of  large  variances  (safety  margins,  resilience,  aging)   §  New  technologies  are  expensive   –  NRE  increases  faster  than  profits  –  forces  consolida.on   –  Only  two  companies  can  sustain  the  investments  needed  to  go  below  22nm   (Intel  and  Samsung)    [Heck,  Kaza,  Pinner]   §  Less  compe..on  &  larger  investments  =  slower  progress   MCS    -­‐-­‐  Marc  Snir   November  13   13  
  • 14. The Future Is Not What It Was (courtesy  J.  Aldun)   ANL-­‐LBNL-­‐ORNL-­‐PNNL     19  November  2013   14  
  • 15. The Path of Least Resistance – Other than Moore §  Industry  goal  is  not  increased  performance;  it  is  increased   ROI.  Industry  will  increasingly  invest  in  alterna.ves  as   increasing  performance  becomes  more  expensive   –  Low  power,  low  cost   –  New  markets:  MEMS,  sensors   –  System  on  a  chip  (smartphone,  tablet)   ✗  Fewer  good  commodity  building  blocks  for  HPC   –  No  low-­‐power/high-­‐flops/high-­‐resilience  CPU   ✔ More  opportuni.es  for  semi-­‐custom  and  integra.on  of   mul.ple  vendor  IP  on  a  chip   §  New  business  model  for  supercompu.ng?   –  Semi-­‐custom  &  system  on  a  chip  integrator    
  • 16. Exascale MCS    -­‐-­‐  Marc  Snir   November  13   16  
  • 17. Identified Issues §  Scale  (billion  threads)   §  Power  (10’s  of  MWaJs)   –  Communica<on:  >  99%  of  power  is  consumed  by  moving   operands  across  the  memory  hierarchy  and  across  nodes   –  Reduced  memory  size:  (communica.on  in  .me)   §  Resilience:  Something  fails  every  hour;  the  machine  is  never   “whole”   –  Trade-­‐off  between  power  and  resilience   §  Asynchrony:  Equal  work  ≠  equal  .me   –  Power  management   –  Error  recovery   MCS    -­‐-­‐  Marc  Snir   November  13   17  
  • 18. My Main Concerns §  Uncertainly  about  underlying  HW  architecture   –  Slower  progress  of  IC  will  necessitate  faster  progress  of  architecture   –  May  not  converge  to  a  new,  stable  model   –  It  is  not  about  por.ng  applica.ons  to  a  new  programming  model  –  it   is  about  designing  applica.ons  for  portability   §  Increased  soFware  complexity   –  Simula.ons  of  complex  systems  +  uncertainty  quan.fica.on  +   op.miza.on…   –  Support  of  complex  workflows  (e.g.,  in  situ  analysis)   –  SoLware  management  of  power  and  failures   –  Heterogeneity   –  Scale  and  .ght  coupling  (tail  of  distribu.on  maJers!)   –  Hypothesis:  soLware  will  con.nue  to  be  dominant  cause  of  failures   MCS    -­‐-­‐  Marc  Snir   November  13   18  
  • 19. Conclusion §  Moore’s  Law  is  slowing  down;  the  slow-­‐down  has  many   fundamental  consequences  –  only  a  few  of  them  explored  in  this   talk   §  HPC  is  the  “canary  in  the  mine”:   –  issues  appear  earlier  because  of  size  and  .ght  coupling   §  Op.mis.c  view  of  the  next  decades:  no  stasis.     –  A  frenzy  of  innova.on  to  con.nue  pushing  current  ecosystem,   followed  by  frenzy  of  innova.on  to  use  totally  different   compute  technologies   §  Pessimis.c  view:    The  end  is  coming   MCS    -­‐-­‐  Marc  Snir   November  13   19  
  • 20. MCS    -­‐-­‐  Marc  Snir   November  13   20  
  • 21. Backup MCS    -­‐-­‐  Marc  Snir   November  13   21  
  • 22. Do We Care? §  It’s  all  about  Big  Data  Now,  simula.ons  are  passé.   §  B***t   §  All  science  is  either  physics  or  stamp  collec0ng.  (Ernest   Rutherford)   –  In  Physical  Sciences,  experiments  and  observa.ons  exist  to   validate/refute/mo.vate  theory.  “Data  Mining”  not  driven  by  a   scien.fic  hypothesis  is  “stamp  collec.on”.   §  Simula.on  is  needed  to  go  from  a  mathema.cal  model  to   predic.ons  on  observa.ons.   –  If  system  is  complex  (e.g.,  climate)  then  simula.on  is  expensive   –  OLen,  models  are  stochas.c  and  predic.ons  are  sta.s.cal  –   complica.ng  both  simula.on  and  data  analysis     MCS    -­‐-­‐  Marc  Snir   November  13   22  
  • 23. Observation Meets Data: Cosmology Record-­‐breaking  applica.on:  3.6  Trillion   Computation Meets par.cles,  14  Pflop/s   Data: The Argonnealman  Habib)   (courtesy  S View Supercomputer Simulation Campaign Mapping the Sky with Survey Instruments HACC=Hardware/Hybrid Accelerated Cosmology Code(s) LSST ‘Cosmic Calibration’ HACC+CCF (Domain science+CS+Math+Stats +Machine learning) LSST Weak Lensing w = -1 w = - 0.9 ‘Precision Oracle’ Emulator based on Gaussian Process Interpolation in HighDimensional Spaces CCF= Cosmic Calibration Framework Markov chain Monte Carlo Observations: Statistical error bars will ‘disappear’ soon!