SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
NST 121
Computer Systems
Fundamentals
INTRODUCTION TO COMPUTERS
Gary Tarolli - 3dfx and Nvidia
3D Graphics Engineer
Monday, April 27
3D Graphics from my career perspective
1974-1978 BS. Math RPI (minor in CS)
1979-1980 MS CS Caltech
1980-1983 Digital Equipment Corp
1984-1992 Silicon Graphics, Inc
1992-1993 consulting
1993-2000 3dfx
2000- nvidia
or “Moore’s Law viewed from my career”
Moore’s law at 50 (years) publication came in the mail last week …
Various articles in the news too … should we throw a party or a wake ?
Moore’s law in action over 4 decades
Moore’s Law : http://www.mooreslaw.org
The most popular formulation is :
the number of transistors on and integrated circuit
doubles about every two years. (same size chip)
e.g. 500nm to 350nm is sqrt(2) shrink on one side
of a chip, so square = 2x as dense (# transistors)
Note: in addition the clock speed increases
and the chip area increases (better manufacturing)
Cost per transistor or performance drops!
Result: trends over 4 decades …
Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx)
The rise of importance of 3D graphics and hence graphics chips
Consolidation in the 3d graphics industry
◦ ~40 3d graphics chip startups in 1994
◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR)
◦ 2 cpu/system companies : Intel, AMD , Apple
Surprise: graphics chips power supercomputers
Surprise: cars
◦ 8 million cars with nvidia chips in them, many more coming
◦ Self driving cars are coming: enabled by supercomputing power in cheap chips
Surprise: deep neural net learning enabled by this computing power is exploding
Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
http://en.wikipedia.org/wiki/The_Singularity_Is_Near
You probably don’t believe this now,
see if you do in an hour …
So let’s begin the journey …
1974-1978 : BS. Math & CS RPI
1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary)
1975 – my first computer program on an IBM 360 mainframe
(using my friends engineering account)
1979-1980 : MS CS Caltech
1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics
until 4am , living off of $.25 ice cream sandwiches
1979-1980 : MS CS Caltech …
Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single
transistor on a chip
inverter inverter
1979-1980 : MS CS Caltech …
MIT class projects in 1978
1980-1983 : DEC (minicomputer) #93246
CPUS were still many boards of logic
I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX
And go
from this :
A refrigerator filled with boards …
1980-1983 : DEC (minicomputer) …
To this …
1984-1992 : SGI(workstation) #36
IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010
IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive
My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight
1984-1992 : SGI, Silicon Graphics, Inc …
IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000
◦ 100k lines/sec, 10k triangles/sec
◦ Almost all of SGI GL library implemented in software on MIPs
1984-1992 : SGI, Silicon Graphics, Inc …
1991: IRIS vision: $4000 board set for the PC, ISA and microchannel
◦ http://en.wikipedia.org/wiki/IrisVision
Intel 486 and bus architecture just too slow, so died in obscurity …
But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming
… faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus ….
and that SGI would be out of business some day if it didn’t transform itself
But going from 80% margins to 20% margins is not easy to swallow. They did not …
we voted with our feet and left (along with others who went to Nvidia and elsewhere)
and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later …
$0 to $5 billion back to $0
Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400
Beautiful real-time texture mapped graphics (divide per pixel)
◦ 1M triangles/sec, 100 Mpixels/sec
1984-1992 : SGI, Silicon Graphics, Inc …
1993-2000 : 3Dfx (PC) employee #1
Why:
◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process)
◦ We saw a problem within SGI, and an opportunity in 3d PC graphics
◦ Engineers – we saw a cool problem and wanted to solve it
◦ We realized the gaming market was a lot bigger than anyone knew
◦ ~$5B at the time, almost as big as movie industry
◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry
Goal:
◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps
◦ Similar means reduced quality (less bit depth) but still excellent
Activation energy: Caroline said “Just do it” one day
1993-2000 : 3Dfx (PC) …
How:
◦ Take maximum usage of just arriving technology
◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed
◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality
◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering
◦ Listened to game developers and professionals – tech. advisory board
◦ John Carmack (id)
◦ Tim Sweeney (Epic)
◦ Tom Porter (Pixar)
A bit of luck, ok a lot?
◦ $500 too costly for consumer market, so we targeted the arcades
◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc.
◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market
1993-2000 : 3Dfx (PC) …
Key to quality texture mapping is per-pixel divide
◦ Very costly
◦ Key is to be just good enough
◦ We didn’t need 32 bit results, only about 18-20 bits
◦ Just enough to not be visually distracting
◦ So we used a table lookup, and then linear interpolation (which helped a lot)
◦ Remember those sin/cos/tan tables in high school trig? Same basic idea
◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM)
◦ 4 bit interpolation, adds another 3-4 bits
◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation
Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2))
◦ Log2 (sqrt(x)) = .5 * Log2 (x)
1993-2000 : 3Dfx
1993-2000 : 3Dfx …
C simulator
◦ Very fast bit accurate simulator for the chip
◦ 10k to 50k lines of C code
◦ Can research algorithms quickly
◦ Up and running well before RTL simulator
◦ You can develop software and hardware tests on C simulator
RTL simulator
◦ Verilog
Before tapeout, we compare C vs Verilog results for chip functional tests that we write
Story time : code then test, vs test then code
1993-2000 : 3Dfx… debugging
Yogi Berra: In theory there is no difference between theory and practice. In practice there is.
From Bandits? : Always expect the unexpected, except of course the truly unexpected …
Me: If you cannot believe there is a bug (in your code), then you will never find it.
1993-2000 : 3Dfx Voodoo 1
Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec
◦ Each chip was ~1 million transistors, 250k gates
1993-2000 : 3Dfx Voodoo 1
System architecture – perhaps my best work ever (along with Scott Sellers)
1993-2000 : 3Dfx Voodoo 1 results
Images tell the story … compared to Reality Engine …
1993-2000 : 3Dfx Voodoo 2
1993-2000 : 3Dfx Voodoo 2 , 3
Voodoo 3 : ~4 years after Voodoo 1
1 chip vs 2-3 chips
Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count)
Clock rate: 50 Mhz to 200 Mhz
Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB
https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units
2000-now : nvidia
We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof!
◦ Another one bites the dust
One strategic mistake – we did not put T&L on a chip until too late
◦ our next product had T&L , but it was still in the lab
◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did
◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority
◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?)
◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU
◦ Supercomputer speed floating point is basically for free on a GPU
◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer
◦ Many times more powerful than the early CRAY supercomputers
2000-now : nvidia Titan X
Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9
4x8 green dots = one SM (SIMD cpu)
3072 of them on the die
Each is ~Voodoo 2 or more
2000-now : 1995 + 20 years = 2015
over 20 years Moore’s law says we should expect 2**10 increase or 1000x
Voodoo 1 Titan X x increase
Transistors 2 M (2 chips) 8000 M 4000
Cores 1 2000-3000 2500
Technology 500 nm 28 nm 300
Area 100 mm2 600 mm2 6
Triangles/sec 1 M 6000 M 6000
Mpixels/sec 100M 100,000 M 1000
Ops/sec 5 B (8b) 7000 B (32b ieee) 1000
Memory b/w < 1 GB/sec 340 GB/sec 400
Power 4 watts 250 watts (the price you pay)
Frequency 50 Mhz 1000 Mhz 20
Memory 4 MB 12,000 MB 3000
Cost $500 $1000 2
Design 5 man years ($5M) >500 man years ($500M) 100
CPUs vs GPUS
Graphics is embarrassingly parallel ! (millions of pixels on the screen)
◦ Which is why 1000-3000 cores can be efficient
◦ If your PC has 1000-3000 cores, what would they do?
PIXAR field trip (while at 3dfx)
◦ Server room full of Sun workstations
◦ Limit is how much computing power you can fit in that physical room (and A/C)
Supercomputers
◦ Super computers are often limited to a power budget in MWatts for cpus and A/C
◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….
2000-now : 3dfx + nvidia … looking back
Need I say more:
1995: 0% of consumer PCs have 3d graphics accelerators
2015: 100% penetration (embedded accelerator in all Intel and AMD chips)
Deep neural net analysis, deep learning
Is this the key to Artificial Intelligence becoming real?
Intel 16 core XEON = 43 days to train a DNN problem
Titan-X = 1.5 days
Next year < 1 day
5 years … 1 hour (with software advances)
20 years … 1 sec to 1 minute ?
Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
Now do you believe?
Is Artificial Intelligence really almost here?
GPU Fanatic (last week this came in my nvidia email)
Ray Kurzweil, a renowned futurist and the director of engineering at Google:
“…the hardware needed to emulate the human brain may be ready even
sooner than he predicted — in around 2020 — using technologies such as
graphics processing units (GPUs), which are ideal for brain-software
algorithms.” (Washington Post, 4/23/14)
Self promoting Links:
http://www.thedodgegarage.com/3dfx/
https://en.wikipedia.org/wiki/3dfx_Interactive
simply google everything else, e.g. deep learning
(that’s what I did)

Contenu connexe

Similaire à 3dfx, nvidia, Moore's Law and more...

AI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAlbert Y. C. Chen
 
Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!) Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!) alekn
 
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
My amazing journey from mainframes to smartphones  chm lecture aug 2014 finalMy amazing journey from mainframes to smartphones  chm lecture aug 2014 final
My amazing journey from mainframes to smartphones chm lecture aug 2014 finalDileep Bhandarkar
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and presentMuhammad Danish Badar
 
Video Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersVideo Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersThomas Walker Lynch
 
Appsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phoneAppsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phonemarcocjacobs
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
arquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfarquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfbrydyl
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Overinside-BigData.com
 
Computer Basics
Computer Basics Computer Basics
Computer Basics BIT DURG
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Scala Italy
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM Research
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cIan Phillips
 
Kickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardwareKickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardwareAndreas Olofsson
 
Internet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! nightInternet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! nightAndy Gelme
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
 

Similaire à 3dfx, nvidia, Moore's Law and more... (20)

AI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thing
 
Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!) Future of computing is boring (and that is exciting!)
Future of computing is boring (and that is exciting!)
 
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
My amazing journey from mainframes to smartphones  chm lecture aug 2014 finalMy amazing journey from mainframes to smartphones  chm lecture aug 2014 final
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
 
Co315 part 1
Co315   part 1Co315   part 1
Co315 part 1
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and present
 
Video Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersVideo Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of Browsers
 
Appsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phoneAppsterdam talk - about the chips inside your phone
Appsterdam talk - about the chips inside your phone
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
arquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfarquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdf
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Over
 
Computer Basics
Computer Basics Computer Basics
Computer Basics
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21c
 
Kickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardwareKickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardware
 
Internet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! nightInternet Of Things: Hands on: YOW! night
Internet Of Things: Hands on: YOW! night
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
 

Dernier

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Dernier (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

3dfx, nvidia, Moore's Law and more...

  • 2. Gary Tarolli - 3dfx and Nvidia 3D Graphics Engineer Monday, April 27
  • 3. 3D Graphics from my career perspective 1974-1978 BS. Math RPI (minor in CS) 1979-1980 MS CS Caltech 1980-1983 Digital Equipment Corp 1984-1992 Silicon Graphics, Inc 1992-1993 consulting 1993-2000 3dfx 2000- nvidia
  • 4. or “Moore’s Law viewed from my career” Moore’s law at 50 (years) publication came in the mail last week … Various articles in the news too … should we throw a party or a wake ?
  • 5. Moore’s law in action over 4 decades Moore’s Law : http://www.mooreslaw.org The most popular formulation is : the number of transistors on and integrated circuit doubles about every two years. (same size chip) e.g. 500nm to 350nm is sqrt(2) shrink on one side of a chip, so square = 2x as dense (# transistors) Note: in addition the clock speed increases and the chip area increases (better manufacturing) Cost per transistor or performance drops!
  • 6. Result: trends over 4 decades … Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx) The rise of importance of 3D graphics and hence graphics chips Consolidation in the 3d graphics industry ◦ ~40 3d graphics chip startups in 1994 ◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR) ◦ 2 cpu/system companies : Intel, AMD , Apple Surprise: graphics chips power supercomputers Surprise: cars ◦ 8 million cars with nvidia chips in them, many more coming ◦ Self driving cars are coming: enabled by supercomputing power in cheap chips Surprise: deep neural net learning enabled by this computing power is exploding
  • 7. Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil http://en.wikipedia.org/wiki/The_Singularity_Is_Near You probably don’t believe this now, see if you do in an hour … So let’s begin the journey …
  • 8. 1974-1978 : BS. Math & CS RPI 1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary) 1975 – my first computer program on an IBM 360 mainframe (using my friends engineering account)
  • 9. 1979-1980 : MS CS Caltech 1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics until 4am , living off of $.25 ice cream sandwiches
  • 10. 1979-1980 : MS CS Caltech … Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single transistor on a chip inverter inverter
  • 11. 1979-1980 : MS CS Caltech … MIT class projects in 1978
  • 12. 1980-1983 : DEC (minicomputer) #93246 CPUS were still many boards of logic I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX And go from this : A refrigerator filled with boards …
  • 13. 1980-1983 : DEC (minicomputer) … To this …
  • 14. 1984-1992 : SGI(workstation) #36 IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010 IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight
  • 15. 1984-1992 : SGI, Silicon Graphics, Inc … IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000 ◦ 100k lines/sec, 10k triangles/sec ◦ Almost all of SGI GL library implemented in software on MIPs
  • 16. 1984-1992 : SGI, Silicon Graphics, Inc … 1991: IRIS vision: $4000 board set for the PC, ISA and microchannel ◦ http://en.wikipedia.org/wiki/IrisVision Intel 486 and bus architecture just too slow, so died in obscurity … But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming … faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus …. and that SGI would be out of business some day if it didn’t transform itself But going from 80% margins to 20% margins is not easy to swallow. They did not … we voted with our feet and left (along with others who went to Nvidia and elsewhere) and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later … $0 to $5 billion back to $0
  • 17. Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400 Beautiful real-time texture mapped graphics (divide per pixel) ◦ 1M triangles/sec, 100 Mpixels/sec 1984-1992 : SGI, Silicon Graphics, Inc …
  • 18. 1993-2000 : 3Dfx (PC) employee #1 Why: ◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process) ◦ We saw a problem within SGI, and an opportunity in 3d PC graphics ◦ Engineers – we saw a cool problem and wanted to solve it ◦ We realized the gaming market was a lot bigger than anyone knew ◦ ~$5B at the time, almost as big as movie industry ◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry Goal: ◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps ◦ Similar means reduced quality (less bit depth) but still excellent Activation energy: Caroline said “Just do it” one day
  • 19. 1993-2000 : 3Dfx (PC) … How: ◦ Take maximum usage of just arriving technology ◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed ◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality ◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering ◦ Listened to game developers and professionals – tech. advisory board ◦ John Carmack (id) ◦ Tim Sweeney (Epic) ◦ Tom Porter (Pixar) A bit of luck, ok a lot? ◦ $500 too costly for consumer market, so we targeted the arcades ◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc. ◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market
  • 20. 1993-2000 : 3Dfx (PC) … Key to quality texture mapping is per-pixel divide ◦ Very costly ◦ Key is to be just good enough ◦ We didn’t need 32 bit results, only about 18-20 bits ◦ Just enough to not be visually distracting ◦ So we used a table lookup, and then linear interpolation (which helped a lot) ◦ Remember those sin/cos/tan tables in high school trig? Same basic idea ◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM) ◦ 4 bit interpolation, adds another 3-4 bits ◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2)) ◦ Log2 (sqrt(x)) = .5 * Log2 (x)
  • 22. 1993-2000 : 3Dfx … C simulator ◦ Very fast bit accurate simulator for the chip ◦ 10k to 50k lines of C code ◦ Can research algorithms quickly ◦ Up and running well before RTL simulator ◦ You can develop software and hardware tests on C simulator RTL simulator ◦ Verilog Before tapeout, we compare C vs Verilog results for chip functional tests that we write Story time : code then test, vs test then code
  • 23. 1993-2000 : 3Dfx… debugging Yogi Berra: In theory there is no difference between theory and practice. In practice there is. From Bandits? : Always expect the unexpected, except of course the truly unexpected … Me: If you cannot believe there is a bug (in your code), then you will never find it.
  • 24. 1993-2000 : 3Dfx Voodoo 1 Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec ◦ Each chip was ~1 million transistors, 250k gates
  • 25. 1993-2000 : 3Dfx Voodoo 1 System architecture – perhaps my best work ever (along with Scott Sellers)
  • 26. 1993-2000 : 3Dfx Voodoo 1 results Images tell the story … compared to Reality Engine …
  • 27. 1993-2000 : 3Dfx Voodoo 2
  • 28. 1993-2000 : 3Dfx Voodoo 2 , 3 Voodoo 3 : ~4 years after Voodoo 1 1 chip vs 2-3 chips Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count) Clock rate: 50 Mhz to 200 Mhz Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units
  • 29. 2000-now : nvidia We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof! ◦ Another one bites the dust One strategic mistake – we did not put T&L on a chip until too late ◦ our next product had T&L , but it was still in the lab ◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did ◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority ◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?) ◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU ◦ Supercomputer speed floating point is basically for free on a GPU ◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer ◦ Many times more powerful than the early CRAY supercomputers
  • 30. 2000-now : nvidia Titan X Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9 4x8 green dots = one SM (SIMD cpu) 3072 of them on the die Each is ~Voodoo 2 or more
  • 31. 2000-now : 1995 + 20 years = 2015 over 20 years Moore’s law says we should expect 2**10 increase or 1000x Voodoo 1 Titan X x increase Transistors 2 M (2 chips) 8000 M 4000 Cores 1 2000-3000 2500 Technology 500 nm 28 nm 300 Area 100 mm2 600 mm2 6 Triangles/sec 1 M 6000 M 6000 Mpixels/sec 100M 100,000 M 1000 Ops/sec 5 B (8b) 7000 B (32b ieee) 1000 Memory b/w < 1 GB/sec 340 GB/sec 400 Power 4 watts 250 watts (the price you pay) Frequency 50 Mhz 1000 Mhz 20 Memory 4 MB 12,000 MB 3000 Cost $500 $1000 2 Design 5 man years ($5M) >500 man years ($500M) 100
  • 32. CPUs vs GPUS Graphics is embarrassingly parallel ! (millions of pixels on the screen) ◦ Which is why 1000-3000 cores can be efficient ◦ If your PC has 1000-3000 cores, what would they do? PIXAR field trip (while at 3dfx) ◦ Server room full of Sun workstations ◦ Limit is how much computing power you can fit in that physical room (and A/C) Supercomputers ◦ Super computers are often limited to a power budget in MWatts for cpus and A/C ◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….
  • 33. 2000-now : 3dfx + nvidia … looking back Need I say more: 1995: 0% of consumer PCs have 3d graphics accelerators 2015: 100% penetration (embedded accelerator in all Intel and AMD chips)
  • 34. Deep neural net analysis, deep learning Is this the key to Artificial Intelligence becoming real? Intel 16 core XEON = 43 days to train a DNN problem Titan-X = 1.5 days Next year < 1 day 5 years … 1 hour (with software advances) 20 years … 1 sec to 1 minute ?
  • 35. Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil Now do you believe? Is Artificial Intelligence really almost here? GPU Fanatic (last week this came in my nvidia email) Ray Kurzweil, a renowned futurist and the director of engineering at Google: “…the hardware needed to emulate the human brain may be ready even sooner than he predicted — in around 2020 — using technologies such as graphics processing units (GPUs), which are ideal for brain-software algorithms.” (Washington Post, 4/23/14)

Notes de l'éditeur

  1. Moore’s law explains alot, but not why I went looking like this to that …
  2. Ugh, that makes me feel old
  3. I did NOT work on this at SGI but this was our target at 3dfx, built something close for $500
  4. When people say, well in theory this should work fine, I don’t understand why its not… I use these quotes… I make observations on my code behavior, and then can often predict where the bug is, regardless of what I think about the quality of my code. And I often find it very quickly. Ego-less debugging.