3dfx, nvidia, Moore's Law and more...

NST 121
Computer Systems
Fundamentals
INTRODUCTION TO COMPUTERS

Gary Tarolli - 3dfx and Nvidia
3D Graphics Engineer
Monday, April 27

3D Graphics from my career perspective
1974-1978 BS. Math RPI (minor in CS)
1979-1980 MS CS Caltech
1980-1983 Digital Equipment Corp
1984-1992 Silicon Graphics, Inc
1992-1993 consulting
1993-2000 3dfx
2000- nvidia

or “Moore’s Law viewed from my career”
Moore’s law at 50 (years) publication came in the mail last week …
Various articles in the news too … should we throw a party or a wake ?

Moore’s law in action over 4 decades
Moore’s Law : http://www.mooreslaw.org
The most popular formulation is :
the number of transistors on and integrated circuit
doubles about every two years. (same size chip)
e.g. 500nm to 350nm is sqrt(2) shrink on one side
of a chip, so square = 2x as dense (# transistors)
Note: in addition the clock speed increases
and the chip area increases (better manufacturing)
Cost per transistor or performance drops!

Result: trends over 4 decades …
Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx)
The rise of importance of 3D graphics and hence graphics chips
Consolidation in the 3d graphics industry
◦ ~40 3d graphics chip startups in 1994
◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR)
◦ 2 cpu/system companies : Intel, AMD , Apple
Surprise: graphics chips power supercomputers
Surprise: cars
◦ 8 million cars with nvidia chips in them, many more coming
◦ Self driving cars are coming: enabled by supercomputing power in cheap chips
Surprise: deep neural net learning enabled by this computing power is exploding

Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
http://en.wikipedia.org/wiki/The_Singularity_Is_Near
You probably don’t believe this now,
see if you do in an hour …
So let’s begin the journey …

1974-1978 : BS. Math & CS RPI
1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary)
1975 – my first computer program on an IBM 360 mainframe
(using my friends engineering account)

1979-1980 : MS CS Caltech
1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics
until 4am , living off of $.25 ice cream sandwiches

1979-1980 : MS CS Caltech …
Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single
transistor on a chip
inverter inverter

1979-1980 : MS CS Caltech …
MIT class projects in 1978

1980-1983 : DEC (minicomputer) #93246
CPUS were still many boards of logic
I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX
And go
from this :
A refrigerator filled with boards …

1980-1983 : DEC (minicomputer) …
To this …

1984-1992 : SGI(workstation) #36
IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010
IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive
My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight

1984-1992 : SGI, Silicon Graphics, Inc …
IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000
◦ 100k lines/sec, 10k triangles/sec
◦ Almost all of SGI GL library implemented in software on MIPs

1991: IRIS vision: $4000 board set for the PC, ISA and microchannel
◦ http://en.wikipedia.org/wiki/IrisVision
Intel 486 and bus architecture just too slow, so died in obscurity …
But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming
… faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus ….
and that SGI would be out of business some day if it didn’t transform itself
But going from 80% margins to 20% margins is not easy to swallow. They did not …
we voted with our feet and left (along with others who went to Nvidia and elsewhere)
and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later …
$0 to $5 billion back to $0

Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400
Beautiful real-time texture mapped graphics (divide per pixel)
◦ 1M triangles/sec, 100 Mpixels/sec

1993-2000 : 3Dfx (PC) employee #1
Why:
◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process)
◦ We saw a problem within SGI, and an opportunity in 3d PC graphics
◦ Engineers – we saw a cool problem and wanted to solve it
◦ We realized the gaming market was a lot bigger than anyone knew
◦ ~$5B at the time, almost as big as movie industry
◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry
Goal:
◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps
◦ Similar means reduced quality (less bit depth) but still excellent
Activation energy: Caroline said “Just do it” one day

1993-2000 : 3Dfx (PC) …
How:
◦ Take maximum usage of just arriving technology
◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed
◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality
◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering
◦ Listened to game developers and professionals – tech. advisory board
◦ John Carmack (id)
◦ Tim Sweeney (Epic)
◦ Tom Porter (Pixar)
A bit of luck, ok a lot?
◦ $500 too costly for consumer market, so we targeted the arcades
◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc.
◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market

1993-2000 : 3Dfx (PC) …
Key to quality texture mapping is per-pixel divide
◦ Very costly
◦ Key is to be just good enough
◦ We didn’t need 32 bit results, only about 18-20 bits
◦ Just enough to not be visually distracting
◦ So we used a table lookup, and then linear interpolation (which helped a lot)
◦ Remember those sin/cos/tan tables in high school trig? Same basic idea
◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM)
◦ 4 bit interpolation, adds another 3-4 bits
◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation
Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2))
◦ Log2 (sqrt(x)) = .5 * Log2 (x)

1993-2000 : 3Dfx …
C simulator
◦ Very fast bit accurate simulator for the chip
◦ 10k to 50k lines of C code
◦ Can research algorithms quickly
◦ Up and running well before RTL simulator
◦ You can develop software and hardware tests on C simulator
RTL simulator
◦ Verilog
Before tapeout, we compare C vs Verilog results for chip functional tests that we write
Story time : code then test, vs test then code

1993-2000 : 3Dfx… debugging
Yogi Berra: In theory there is no difference between theory and practice. In practice there is.
From Bandits? : Always expect the unexpected, except of course the truly unexpected …
Me: If you cannot believe there is a bug (in your code), then you will never find it.

1993-2000 : 3Dfx Voodoo 1
Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec
◦ Each chip was ~1 million transistors, 250k gates

1993-2000 : 3Dfx Voodoo 1
System architecture – perhaps my best work ever (along with Scott Sellers)

1993-2000 : 3Dfx Voodoo 1 results
Images tell the story … compared to Reality Engine …

1993-2000 : 3Dfx Voodoo 2 , 3
Voodoo 3 : ~4 years after Voodoo 1
1 chip vs 2-3 chips
Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count)
Clock rate: 50 Mhz to 200 Mhz
Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB
https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units

2000-now : nvidia
We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof!
◦ Another one bites the dust
One strategic mistake – we did not put T&L on a chip until too late
◦ our next product had T&L , but it was still in the lab
◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did
◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority
◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?)
◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU
◦ Supercomputer speed floating point is basically for free on a GPU
◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer
◦ Many times more powerful than the early CRAY supercomputers

2000-now : nvidia Titan X
Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9
4x8 green dots = one SM (SIMD cpu)
3072 of them on the die
Each is ~Voodoo 2 or more

2000-now : 1995 + 20 years = 2015
over 20 years Moore’s law says we should expect 2**10 increase or 1000x
Voodoo 1 Titan X x increase
Transistors 2 M (2 chips) 8000 M 4000
Cores 1 2000-3000 2500
Technology 500 nm 28 nm 300
Area 100 mm2 600 mm2 6
Triangles/sec 1 M 6000 M 6000
Mpixels/sec 100M 100,000 M 1000
Ops/sec 5 B (8b) 7000 B (32b ieee) 1000
Memory b/w < 1 GB/sec 340 GB/sec 400
Power 4 watts 250 watts (the price you pay)
Frequency 50 Mhz 1000 Mhz 20
Memory 4 MB 12,000 MB 3000
Cost $500 $1000 2
Design 5 man years ($5M) >500 man years ($500M) 100

CPUs vs GPUS
Graphics is embarrassingly parallel ! (millions of pixels on the screen)
◦ Which is why 1000-3000 cores can be efficient
◦ If your PC has 1000-3000 cores, what would they do?
PIXAR field trip (while at 3dfx)
◦ Server room full of Sun workstations
◦ Limit is how much computing power you can fit in that physical room (and A/C)
Supercomputers
◦ Super computers are often limited to a power budget in MWatts for cpus and A/C
◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….

2000-now : 3dfx + nvidia … looking back
Need I say more:
1995: 0% of consumer PCs have 3d graphics accelerators
2015: 100% penetration (embedded accelerator in all Intel and AMD chips)

Deep neural net analysis, deep learning
Is this the key to Artificial Intelligence becoming real?
Intel 16 core XEON = 43 days to train a DNN problem
Titan-X = 1.5 days
Next year < 1 day
5 years … 1 hour (with software advances)
20 years … 1 sec to 1 minute ?

Coming soon … ???
The Age of Intelligent Machines by Ray Kurzweil
Now do you believe?
Is Artificial Intelligence really almost here?
GPU Fanatic (last week this came in my nvidia email)
Ray Kurzweil, a renowned futurist and the director of engineering at Google:
“…the hardware needed to emulate the human brain may be ready even
sooner than he predicted — in around 2020 — using technologies such as
graphics processing units (GPUs), which are ideal for brain-software
algorithms.” (Washington Post, 4/23/14)

Self promoting Links:
http://www.thedodgegarage.com/3dfx/
https://en.wikipedia.org/wiki/3dfx_Interactive
simply google everything else, e.g. deep learning
(that’s what I did)

3dfx, nvidia, Moore's Law and more...

Recommandé

Recommandé

Contenu connexe

Similaire à 3dfx, nvidia, Moore's Law and more...

Similaire à 3dfx, nvidia, Moore's Law and more... (20)

Dernier

Dernier (20)

3dfx, nvidia, Moore's Law and more...

Notes de l'éditeur