SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Behind the Performance of
Quake 3 Engine:
Fast Inverse Square Root
Maksym Zavershynskyi
Quake 3 Arena

First Person Shooter
Released: 1999
Engine:
Id Tech 3
Average reviewers
score:
~9/10
Architecture
• C-Language
• Client-Server separation
• Virtual Machine
• Local C Compiler for Scripts
• Highly Optimized Code
Shading
Creates the depth of perception
Material Based Shading

+

=

[1]
What makes a nice picture?
•Shading
•Lighting
•Reflections
•...
Angle of Incidence
normal
α
greater α - darker shading
view
Vector Normalization
(x,y,z)

(a,b,c)
1
Vector Normalization
(x,y,z)

(a,b,c)
1
Fast Inverse Square
Root
Inverse Square Root

float Q_rsqrt( float number )
{
return 1.0f/sqrt(number);
}
Fast Approximate
Inverse Square Root
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y;

// evil floating
//point bit level hacking
// what the f☀✿k?

i

//

= 0x5f3759df - ( i >> 1 );

y
y
y

= * ( float * ) &i;
= y * ( threehalfs - ( x2 * y * y ) );
= y * ( threehalfs - ( x2 * y * y ) );

return y;
}

// 1st iteration
// 2nd iteration,
//this can be removed
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;

(1)
(2)
(1)
(3)
//

x2
y
i
i

=
=
=
=

number * 0.5F;
number;
* ( long * ) &y;
0x5f3759df - ( i >> 1 );

y
y
y

= * ( float * ) &i;
= y * ( threehalfs - ( x2 * y * y ) );
= y * ( threehalfs - ( x2 * y * y ) );

// evil floating point bit level hacking
// what the f☀✿k?
// 1st iteration
// 2nd iteration, this can be removed

return y;
}

(1)Interpret float as integer
(2)Good initial guess with magic number 0x5f3759df
(3)One iteration of Newton’s approximation
(1)Interpret float as integer
32-bit float:
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E

M

0.15625 which is 1.01x2-3 in binary
E=-3+127=124 or 01111100 in binary
M=.01
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E → E/2
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E → E/2
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the magic number 0x5f3759df
0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

0x5f3759df - (i>>1)
0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

result: 2.614 (exact value 1/sqrt(x)=2.52982..)
(1)Interpret float as integer
float x=0.15625
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i
0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1
0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the magic number 0x5f3759df
0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

0x5f3759df - (i>>1)
0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

result: 2.614 (exact value 1/sqrt(x)=2.52982..)
(2)Magic Number: 0x5f3759df

•Gives a good initial guess.
•Minimizes the relative error.
•Trying to find a better number that minimizes
the error of initial guess we come up with:
0x5f37642f

[4]
(2)Magic Number: 0x5f3759df

•Gives a good initial guess.
•Minimizes the relative error.
•Trying to find a better number that minimizes
the error of initial guess we come up with:
0x5f37642f
Did we find a better magical number? ;)

[4]
(3)One iteration of Newton’s method
Newton’s method:
Given a suitable approximation yn to the root of f(y),
gives a better one yn+1 using

root
(3)One iteration of Newton’s method
Newton’s method:
Given a suitable approximation yn to the root of f(y),
gives a better one yn+1 using

In our case:

y

= y * ( 1.5f - ( 0.5f * x * y * y ) );
(3)One iteration of Newton’s method
After one iteration of Newton’s method
our magic number 0x5f37642f gives worse approximation
than the original magic number 0x5f3759df !!! [4]
Open Question:
How was the original magic number derived?
Open Question:
How was the original magic number 0x5f3759df derived?

•Lomont in 2003 numerically found a slightly better
magic number 0x5f375a86

[4]

•Robertson in 2012 analytically found the same
better magic number 0x5f375a86

[3]
How good?
Max relative error: 0.177%

[3]

With the 2nd iteration of Newton’s method: 0.00047% [3]
In 1999: ???

How fast?

Today: on CPUs 3-4 times faster
With the 2nd iteration of Newton’s method: 2-2.5 faster

[3]
Who wrote it?
Who?
John Carmack?
Lead Programmer of Quake, Doom,
Wolfenstein 3D
[8]

Michael Abrash?
Author of:
Zen of Assembly Language
Zen of Graphics Programming
Who?
John Carmack?
Lead Programmer of Quake, Doom,
Wolfenstein 3D
“...Not me, and I don’t think it is Michael (Abrash).
Terje Mathison perhaps?...”

Michael Abrash?
Author of:
Zen of Assembly Language
Zen of Graphics Programming

[8]
Who?
Terje Mathisen?
Assembly language optimization for x86
microprocessors.
“... I wrote fast & accurate invssqrt()... for a
computational fluid chemistry problem...
...The code is not the same as I wrote...”
[8]
Who?
Gary Tarolli?
Co-founder of 3dfx (predecessor of Nvidia)

[8]
Who?
Gary Tarolli?
Co-founder of 3dfx (predecessor of Nvidia)
“It did pass by my keyboard many many years ago, I
may have tweaked the hex constant a bit or so, but
other than that I can’t take credit for it, except that
I used it a lot and probably contributed to its
popularity and longevity. “
[8]
Who?
Gary Tarolli?
Co-founder of 3dfx (predecessor of Nvidia)
“It did pass by my keyboard many many years ago, I
may have tweaked the hex constant a bit or so, but
other than that I can’t take credit for it, except that
I used it a lot and probably contributed to its
popularity and longevity. “
[8]

This hack is older than 1990!!!
Who?
Cleve Moler inspiration
Founder of the first MATLAB,
one of the founders of MathWorks,
is currently a Chief Mathematician there.
Greg Walsch author (most probably)
Being working on Internet and distributed
computing technologies since before it was even
the Internet, and helping to engineer the first
WYSIWYG word processor at Xerox PARC
while at Stanford University

[9]

[9]
Who?
Inspired by Cleve Moler from the code written
by Velvel Kahan and K.C. Ng at Berkeley around
1986!!!
http://www.netlib.org/fdlibm/e_sqrt.c

[10]
Finally
It is Fast:

3-4 faster than the straightforward code

It is Good:

0.17% maximum relative error

It can be Improved
Dates back in 1986
Thank you!
http://zavermax.github.io
Some literature here
Quake 1,3 Architecture
1)

Fabien Sanglard, Quake 3 source code review. 2012 http://fabiensanglard.net/quake3/

2)

Michael Abrash, Ramblings in Realtime http://www.bluesnews.com/abrash/

Inverse Square Root
3)

Matthew Robertson, A Brief History of InvSqrt. 2012 Bachelor’s Thesis. Brunswick, Germany

4)

Chris Lomont, Fast Inverse Square root, Indiana: Purdue University, 2003

5)

Jim Blinn, Floating-point tricks, IEEE Comp. Graphics and Applications 17, no 4, 1997

6)

David Elbery, Fast Inverse square root (Revisited), Geometric Tools, LLC, 2010

7)

Charles McEniry, The Mathematics Behind the Fast Inverse Square Root Function Code, 2007

Investigation of the Authorship
8)

Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() 2006 http://www.beyond3d.com/content/articles/8/

9)

Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() - Part Two 2007 http://www.beyond3d.com/content/articles/15/

10)

http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13

Additional
11)

http://en.wikipedia.org/wiki/Fast_inverse_square_root

12)

https://github.com/id-Software/Quake-III-Arena

Contenu connexe

Tendances

[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)
[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)
[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)
MinGeun Park
 
[0410 박민근] 기술 면접시 자주 나오는 문제들
[0410 박민근] 기술 면접시 자주 나오는 문제들[0410 박민근] 기술 면접시 자주 나오는 문제들
[0410 박민근] 기술 면접시 자주 나오는 문제들
MinGeun Park
 

Tendances (20)

Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipeline
 
원주율은 정말 모든 수를 담고 있을까. (Does pi have all positive numbers in it?)
원주율은 정말 모든 수를 담고 있을까. (Does pi have all positive numbers in it?)원주율은 정말 모든 수를 담고 있을까. (Does pi have all positive numbers in it?)
원주율은 정말 모든 수를 담고 있을까. (Does pi have all positive numbers in it?)
 
[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)
[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)
[0119 박민근] 기술 면접시 자주 나오는 문제들(ver 2013)
 
물리 기반 셰이더의 허와 실:물리기반 셰이더를 가르쳐 봤습니다 공개용
물리 기반 셰이더의 허와 실:물리기반 셰이더를 가르쳐 봤습니다  공개용물리 기반 셰이더의 허와 실:물리기반 셰이더를 가르쳐 봤습니다  공개용
물리 기반 셰이더의 허와 실:물리기반 셰이더를 가르쳐 봤습니다 공개용
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo Postmortem
 
Ssao
SsaoSsao
Ssao
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
Building fast interpreters in Rust
Building fast interpreters in RustBuilding fast interpreters in Rust
Building fast interpreters in Rust
 
MMOG Server-Side 충돌 및 이동처리 설계와 구현
MMOG Server-Side 충돌 및 이동처리 설계와 구현MMOG Server-Side 충돌 및 이동처리 설계와 구현
MMOG Server-Side 충돌 및 이동처리 설계와 구현
 
Precomputed atmospheric scattering(사전 계산 대기 산란)
Precomputed atmospheric scattering(사전 계산 대기 산란)Precomputed atmospheric scattering(사전 계산 대기 산란)
Precomputed atmospheric scattering(사전 계산 대기 산란)
 
Display color와 Digital texture format의 이해
Display color와 Digital texture format의 이해Display color와 Digital texture format의 이해
Display color와 Digital texture format의 이해
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
[0410 박민근] 기술 면접시 자주 나오는 문제들
[0410 박민근] 기술 면접시 자주 나오는 문제들[0410 박민근] 기술 면접시 자주 나오는 문제들
[0410 박민근] 기술 면접시 자주 나오는 문제들
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
소프트웨어 엔지니어의 한국/미국 직장생활
소프트웨어 엔지니어의 한국/미국 직장생활소프트웨어 엔지니어의 한국/미국 직장생활
소프트웨어 엔지니어의 한국/미국 직장생활
 
Tutoriel sur le streaming vidéo sur HTTP et sur MPEG-DASH
Tutoriel sur le streaming vidéo sur HTTP et sur MPEG-DASHTutoriel sur le streaming vidéo sur HTTP et sur MPEG-DASH
Tutoriel sur le streaming vidéo sur HTTP et sur MPEG-DASH
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 

Similaire à Behind the Performance of Quake 3 Engine: Fast Inverse Square Root

Binary Mathematics Classwork and Hw
Binary Mathematics Classwork and HwBinary Mathematics Classwork and Hw
Binary Mathematics Classwork and Hw
Joji Thompson
 
Lecture4 binary-numbers-logic-operations
Lecture4  binary-numbers-logic-operationsLecture4  binary-numbers-logic-operations
Lecture4 binary-numbers-logic-operations
markme18
 
Seismic data processing introductory lecture
Seismic data processing introductory lectureSeismic data processing introductory lecture
Seismic data processing introductory lecture
Amin khalil
 

Similaire à Behind the Performance of Quake 3 Engine: Fast Inverse Square Root (20)

04-logic-gates (1).ppt
04-logic-gates (1).ppt04-logic-gates (1).ppt
04-logic-gates (1).ppt
 
Binary Mathematics Classwork and Hw
Binary Mathematics Classwork and HwBinary Mathematics Classwork and Hw
Binary Mathematics Classwork and Hw
 
3D Math Without Presenter Notes
3D Math Without Presenter Notes3D Math Without Presenter Notes
3D Math Without Presenter Notes
 
Introduction to Computing
Introduction to ComputingIntroduction to Computing
Introduction to Computing
 
LOGIC GATES - SARTHAK YADAV
LOGIC GATES - SARTHAK YADAVLOGIC GATES - SARTHAK YADAV
LOGIC GATES - SARTHAK YADAV
 
Diving into Tensorflow.js
Diving into Tensorflow.jsDiving into Tensorflow.js
Diving into Tensorflow.js
 
Bitwise
BitwiseBitwise
Bitwise
 
Lecture4 binary-numbers-logic-operations
Lecture4  binary-numbers-logic-operationsLecture4  binary-numbers-logic-operations
Lecture4 binary-numbers-logic-operations
 
06 floating point
06 floating point06 floating point
06 floating point
 
Number Systems
Number  SystemsNumber  Systems
Number Systems
 
The Day You Finally Use Algebra: A 3D Math Primer
The Day You Finally Use Algebra: A 3D Math PrimerThe Day You Finally Use Algebra: A 3D Math Primer
The Day You Finally Use Algebra: A 3D Math Primer
 
Seismic data processing
Seismic data processingSeismic data processing
Seismic data processing
 
Maths tips
Maths tipsMaths tips
Maths tips
 
21EC201– Digital Principles and system design.pptx
21EC201– Digital Principles and system design.pptx21EC201– Digital Principles and system design.pptx
21EC201– Digital Principles and system design.pptx
 
Seismic data processing introductory lecture
Seismic data processing introductory lectureSeismic data processing introductory lecture
Seismic data processing introductory lecture
 
2013 1
2013 1 2013 1
2013 1
 
Number system
Number systemNumber system
Number system
 
Class 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and PoliticsClass 30: Sex, Religion, and Politics
Class 30: Sex, Religion, and Politics
 
DLD-Introduction.pptx
DLD-Introduction.pptxDLD-Introduction.pptx
DLD-Introduction.pptx
 
Lecture 18 M - Copy.pptx
Lecture 18 M - Copy.pptxLecture 18 M - Copy.pptx
Lecture 18 M - Copy.pptx
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Behind the Performance of Quake 3 Engine: Fast Inverse Square Root

  • 1. Behind the Performance of Quake 3 Engine: Fast Inverse Square Root Maksym Zavershynskyi
  • 2. Quake 3 Arena First Person Shooter Released: 1999 Engine: Id Tech 3 Average reviewers score: ~9/10
  • 3. Architecture • C-Language • Client-Server separation • Virtual Machine • Local C Compiler for Scripts • Highly Optimized Code
  • 6. What makes a nice picture? •Shading •Lighting •Reflections •...
  • 7. Angle of Incidence normal α greater α - darker shading view
  • 11. Inverse Square Root float Q_rsqrt( float number ) { return 1.0f/sqrt(number); }
  • 12. Fast Approximate Inverse Square Root float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating //point bit level hacking // what the f☀✿k? i // = 0x5f3759df - ( i >> 1 ); y y y = * ( float * ) &i; = y * ( threehalfs - ( x2 * y * y ) ); = y * ( threehalfs - ( x2 * y * y ) ); return y; } // 1st iteration // 2nd iteration, //this can be removed
  • 13. float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; (1) (2) (1) (3) // x2 y i i = = = = number * 0.5F; number; * ( long * ) &y; 0x5f3759df - ( i >> 1 ); y y y = * ( float * ) &i; = y * ( threehalfs - ( x2 * y * y ) ); = y * ( threehalfs - ( x2 * y * y ) ); // evil floating point bit level hacking // what the f☀✿k? // 1st iteration // 2nd iteration, this can be removed return y; } (1)Interpret float as integer (2)Good initial guess with magic number 0x5f3759df (3)One iteration of Newton’s approximation
  • 14. (1)Interpret float as integer 32-bit float: 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E M 0.15625 which is 1.01x2-3 in binary E=-3+127=124 or 01111100 in binary M=.01
  • 15. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 16. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E → E/2
  • 17. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E → E/2
  • 18. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 the magic number 0x5f3759df 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0x5f3759df - (i>>1) 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 result: 2.614 (exact value 1/sqrt(x)=2.52982..)
  • 19. (1)Interpret float as integer float x=0.15625 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x as integer i 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 shift right i>>1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 the magic number 0x5f3759df 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0x5f3759df - (i>>1) 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 result: 2.614 (exact value 1/sqrt(x)=2.52982..)
  • 20. (2)Magic Number: 0x5f3759df •Gives a good initial guess. •Minimizes the relative error. •Trying to find a better number that minimizes the error of initial guess we come up with: 0x5f37642f [4]
  • 21. (2)Magic Number: 0x5f3759df •Gives a good initial guess. •Minimizes the relative error. •Trying to find a better number that minimizes the error of initial guess we come up with: 0x5f37642f Did we find a better magical number? ;) [4]
  • 22. (3)One iteration of Newton’s method Newton’s method: Given a suitable approximation yn to the root of f(y), gives a better one yn+1 using root
  • 23. (3)One iteration of Newton’s method Newton’s method: Given a suitable approximation yn to the root of f(y), gives a better one yn+1 using In our case: y = y * ( 1.5f - ( 0.5f * x * y * y ) );
  • 24. (3)One iteration of Newton’s method After one iteration of Newton’s method our magic number 0x5f37642f gives worse approximation than the original magic number 0x5f3759df !!! [4] Open Question: How was the original magic number derived?
  • 25. Open Question: How was the original magic number 0x5f3759df derived? •Lomont in 2003 numerically found a slightly better magic number 0x5f375a86 [4] •Robertson in 2012 analytically found the same better magic number 0x5f375a86 [3]
  • 26. How good? Max relative error: 0.177% [3] With the 2nd iteration of Newton’s method: 0.00047% [3]
  • 27. In 1999: ??? How fast? Today: on CPUs 3-4 times faster With the 2nd iteration of Newton’s method: 2-2.5 faster [3]
  • 29. Who? John Carmack? Lead Programmer of Quake, Doom, Wolfenstein 3D [8] Michael Abrash? Author of: Zen of Assembly Language Zen of Graphics Programming
  • 30. Who? John Carmack? Lead Programmer of Quake, Doom, Wolfenstein 3D “...Not me, and I don’t think it is Michael (Abrash). Terje Mathison perhaps?...” Michael Abrash? Author of: Zen of Assembly Language Zen of Graphics Programming [8]
  • 31. Who? Terje Mathisen? Assembly language optimization for x86 microprocessors. “... I wrote fast & accurate invssqrt()... for a computational fluid chemistry problem... ...The code is not the same as I wrote...” [8]
  • 32. Who? Gary Tarolli? Co-founder of 3dfx (predecessor of Nvidia) [8]
  • 33. Who? Gary Tarolli? Co-founder of 3dfx (predecessor of Nvidia) “It did pass by my keyboard many many years ago, I may have tweaked the hex constant a bit or so, but other than that I can’t take credit for it, except that I used it a lot and probably contributed to its popularity and longevity. “ [8]
  • 34. Who? Gary Tarolli? Co-founder of 3dfx (predecessor of Nvidia) “It did pass by my keyboard many many years ago, I may have tweaked the hex constant a bit or so, but other than that I can’t take credit for it, except that I used it a lot and probably contributed to its popularity and longevity. “ [8] This hack is older than 1990!!!
  • 35. Who? Cleve Moler inspiration Founder of the first MATLAB, one of the founders of MathWorks, is currently a Chief Mathematician there. Greg Walsch author (most probably) Being working on Internet and distributed computing technologies since before it was even the Internet, and helping to engineer the first WYSIWYG word processor at Xerox PARC while at Stanford University [9] [9]
  • 36. Who? Inspired by Cleve Moler from the code written by Velvel Kahan and K.C. Ng at Berkeley around 1986!!! http://www.netlib.org/fdlibm/e_sqrt.c [10]
  • 37. Finally It is Fast: 3-4 faster than the straightforward code It is Good: 0.17% maximum relative error It can be Improved Dates back in 1986
  • 39. Some literature here Quake 1,3 Architecture 1) Fabien Sanglard, Quake 3 source code review. 2012 http://fabiensanglard.net/quake3/ 2) Michael Abrash, Ramblings in Realtime http://www.bluesnews.com/abrash/ Inverse Square Root 3) Matthew Robertson, A Brief History of InvSqrt. 2012 Bachelor’s Thesis. Brunswick, Germany 4) Chris Lomont, Fast Inverse Square root, Indiana: Purdue University, 2003 5) Jim Blinn, Floating-point tricks, IEEE Comp. Graphics and Applications 17, no 4, 1997 6) David Elbery, Fast Inverse square root (Revisited), Geometric Tools, LLC, 2010 7) Charles McEniry, The Mathematics Behind the Fast Inverse Square Root Function Code, 2007 Investigation of the Authorship 8) Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() 2006 http://www.beyond3d.com/content/articles/8/ 9) Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() - Part Two 2007 http://www.beyond3d.com/content/articles/15/ 10) http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13 Additional 11) http://en.wikipedia.org/wiki/Fast_inverse_square_root 12) https://github.com/id-Software/Quake-III-Arena