SlideShare une entreprise Scribd logo
1  sur  162
Télécharger pour lire hors ligne
Cranking Floating Point
Performance Up To 11 
             Noel Llopis
            Snappy Touch

    http://twitter.com/snappytouch
       noel@snappytouch.com
     http://gamesfromwithin.com
Floating Point
Performance
Floating point numbers
Floating point numbers

• Representation of rational numbers
Floating point numbers

• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
Floating point numbers

• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
Floating point numbers

• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
Floating point numbers

• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
• Double precision: 64 bits
Floating point numbers
Floating point numbers
Why floating point
 performance?
Why floating point
    performance?

• Most games use floating point numbers for
  most of their calculations
Why floating point
     performance?

• Most games use floating point numbers for
  most of their calculations
• Positions, velocities, physics, etc, etc.
Why floating point
     performance?

• Most games use floating point numbers for
  most of their calculations
• Positions, velocities, physics, etc, etc.
• Maybe not so much for regular apps
CPU
CPU

• 32-bit RISC ARM 11
CPU

• 32-bit RISC ARM 11
• 400-535Mhz
CPU

• 32-bit RISC ARM 11
• 400-535Mhz
• iPhone 2G/3G and iPod
  Touch 1st and 2nd gen
CPU (iPhone 3GS)
CPU (iPhone 3GS)


• Cortex-A8 600MHz
CPU (iPhone 3GS)


• Cortex-A8 600MHz
• More advanced
  architecture
CPU
CPU


• No floating point support
  in the ARM CPU!!!
How about integer
     math?
How about integer
        math?

• No need to do any floating point
  operations
How about integer
        math?

• No need to do any floating point
  operations
• Fully supported in the ARM processor
How about integer
        math?

• No need to do any floating point
  operations
• Fully supported in the ARM processor
• But...
Integer Divide
Integer Divide
Integer Divide




There is no integer divide
Fixed-point arithmetic
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer
  values at a reduced range/resolution.
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer
  values at a reduced range/resolution.
• Not so great...
Floating point support
Floating point support

•   There’s a floating point unit
Floating point support

•   There’s a floating point unit

•   Compiled C/C++/ObjC
    code uses the VFP unit for
    any floating point
    operations.
Sample program
Sample program
	   struct Particle
	   {
	   	 float x, y, z;
	   	 float vx, vy, vz;
	   };
Sample program
	   struct Particle       for (int i=0; i<MaxParticles; ++i)
	   {                     {
	   	 float x, y, z;          Particle& p = s_particles[i];
	   	 float vx, vy, vz;       p.x += p.vx*dt;
	   };                        p.y += p.vy*dt;
                              p.z += p.vz*dt;
                              p.vx *= drag;
                              p.vy *= drag;
                              p.vz *= drag;
                          }
Sample program
	   struct Particle       for (int i=0; i<MaxParticles; ++i)
	   {                     {
	   	 float x, y, z;          Particle& p = s_particles[i];
	   	 float vx, vy, vz;       p.x += p.vx*dt;
	   };                        p.y += p.vy*dt;
                              p.z += p.vz*dt;
                              p.vx *= drag;
                              p.vy *= drag;
                              p.vz *= drag;
                          }




         • 7.2 seconds on an iPod Touch 2nd gen
Floating point support
Floating point support

   Trust no one!
Floating point support

   Trust no one!
 When in doubt, check the
   assembly generated
Floating point support
Thumb Mode
Thumb Mode
Thumb Mode
    •   CPU has a special thumb
        mode.
Thumb Mode
    •   CPU has a special thumb
        mode.

    •   Less memory, maybe better
        performance.
Thumb Mode
    •   CPU has a special thumb
        mode.

    •   Less memory, maybe better
        performance.

    •   No floating point support.
Thumb Mode
    •   CPU has a special thumb
        mode.

    •   Less memory, maybe better
        performance.

    •   No floating point support.

    •   Every time there’s an fp
        operation, it switches out of
        Thumb, does the fp operation,
        and switches back on.
Thumb Mode
Thumb Mode

    • It’s on by default!
Thumb Mode

    • It’s on by default!
    • Potentiallyoff. wins
      turning it
                  HUGE
Thumb Mode

    • It’s on by default!
    • Potentiallyoff. wins
      turning it
                  HUGE
Thumb Mode
Thumb Mode

• Turning off Thumb mode increased
  performance in Flower Garden by over 2x
Thumb Mode

• Turning off Thumb mode increased
  performance in Flower Garden by over 2x
• Heavy usage of floating point operations
  though
Thumb Mode

• Turning off Thumb mode increased
  performance in Flower Garden by over 2x
• Heavy usage of floating point operations
  though
• Most games will probably benefit from
  turning it off (especially 3D games)
2.6 seconds!
ARM assembly
   DISCLAIMER:
ARM assembly
            DISCLAIMER:
I’m not an ARM assembly expert!!!
ARM assembly
            DISCLAIMER:
I’m not an ARM assembly expert!!!
ARM assembly
            DISCLAIMER:
I’m not an ARM assembly expert!!!
ARM assembly
              DISCLAIMER:
I’m not an ARM assembly expert!!!




          Z80!!!
ARM assembly
ARM assembly

• Hit the docs
ARM assembly

• Hit the docs
• References included in your USB card
ARM assembly

• Hit the docs
• References included in your USB card
• Or download them from the ARM site
ARM assembly

• Hit the docs
• References included in your USB card
• Or download them from the ARM site
• http://bit.ly/arminfo
ARM assembly
ARM assembly

• Reading assembly is a very important skill
  for high-performance programming
ARM assembly

• Reading assembly is a very important skill
  for high-performance programming
• Writing is more specialized. Most people
  don’t need to.
VFP unit
VFP unit
A0
VFP unit
A0
+
VFP unit
A0
+
B0
VFP unit
A0
+
B0
=
VFP unit
A0
+
B0
=
C0
VFP unit
A0
+
B0
=
C0


A1
+
B1
=
C1
VFP unit
A0   A2
+    +
B0   B2
=    =
C0   C2


A1
+
B1
=
C1
VFP unit
A0   A2
+    +
B0   B2
=    =
C0   C2


A1   A3
+    +
B1   B3
=    =
C1   C3
VFP unit
VFP unit
A0   A1   A2    A3
VFP unit
A0   A1       A2    A3

          +
VFP unit
A0   A1       A2    A3

          +
B0   B1       B2    B3
VFP unit
A0   A1       A2    A3

          +
B0   B1       B2    B3

          =
VFP unit
A0   A1       A2    A3

          +
B0   B1       B2    B3

          =
C0   C1       C2    C3
VFP unit
A0   A1       A2    A3

          +
B0   B1       B2    B3

          =
C0   C1       C2    C3




 Sweet! How do we
    use the vfp?
Like this!

"fldmias %2, {s8-s23}     nt"
"fldmias %1!, {s0-s3}     nt"
"fmuls s24, s8, s0        nt"
"fmacs s24, s12, s1       nt"

"fldmias %1!,   {s4-s7}   nt"

"fmacs s24, s16, s2       nt"
"fmacs s24, s20, s3       nt"
"fstmias %0!, {s24-s27}   nt"
Writing vfp assembly
Writing vfp assembly

• There are two parts to it
Writing vfp assembly

• There are two parts to it
 • How to write any assembly in gcc
Writing vfp assembly

• There are two parts to it
 • How to write any assembly in gcc
 • Learning ARM and VPM assembly
vfpmath library
vfpmath library

• Already done a lot of work for you
vfpmath library

• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
vfpmath library

• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
vfpmath library

• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
• Might not be exactly what you need, but it’s
  a great starting point
Assembly in gcc
• Only use it when targeting the device
Assembly in gcc
 • Only use it when targeting the device
#include <TargetConditionals.h>
#if (TARGET_IPHONE_SIMULATOR == 0) && (TARGET_OS_IPHONE == 1)
	 #define USE_VFP
#endif
Assembly in gcc
• The basics

          asm (“cmp r2, r1”);
Assembly in gcc
    • The basics

                asm (“cmp r2, r1”);




http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-
                    HOWTO.html
Assembly in gcc
• Multiple lines
            asm (
                “mov r0, #1000nt”
                “cmp r2, r1nt”
            );
Assembly in gcc
• Accessing C variables
         asm (//assembly code
             : // output operands
             : // input operands
             : // clobbered registers
         );
Assembly in gcc
• Accessing C variables
             asm (//assembly code
                 : // output operands
                 : // input operands
                 : // clobbered registers
             );

     	   	   int src = 19;
     	   	   int dest = 0;
     	   	
     	   	   asm volatile (
     	   	   	 "add %0, %1, #42"
     	   	   	 : "=r" (dest)
     	   	   	 : "r" (src)
     	   	   	 :
     	   	   );
Assembly in gcc
• Accessing C variables
             asm (//assembly code
                 : // output operands
                 : // input operands
                 : // clobbered registers
             );

     	   	   int src = 19;
     	   	   int dest = 0;
                                    %0, %1, etc are the
     	   	                          variables in order
     	   	   asm volatile (
     	   	   	 "add %0, %1, #42"
     	   	   	 : "=r" (dest)
     	   	   	 : "r" (src)
     	   	   	 :
     	   	   );
Assembly in gcc
Assembly in gcc
	   	   int src = 19;
	   	   int dest = 0;
	   	
	   	   asm volatile (
	   	   	 "add r10, %1, #42nt"
	   	   	 "add %0, r10, #33nt"
	   	   	 : "=r" (dest)
	   	   	 : "r" (src)
	   	   	 : "r10"
	   	   );
Assembly in gcc
	   	   int src = 19;
	   	   int dest = 0;
	   	
	   	   asm volatile (
	   	   	 "add r10, %1, #42nt"
	   	   	 "add %0, r10, #33nt"
	   	   	 : "=r" (dest)
	   	   	 : "r" (src)
	   	   	 : "r10"
	   	   );

                        Clobber register list
                        are registers used by
                           the asm block
Assembly in gcc
	   	   int src = 19;      volatile prevents “optimizations”
	   	   int dest = 0;
	   	
	   	   asm volatile (
	   	   	 "add r10, %1, #42nt"
	   	   	 "add %0, r10, #33nt"
	   	   	 : "=r" (dest)
	   	   	 : "r" (src)
	   	   	 : "r10"
	   	   );

                        Clobber register list
                        are registers used by
                           the asm block
VFP asm
Four banks of 8 32-bit registers each
VFP asm
Four banks of 8 32-bit registers each




    #define VFP_VECTOR_LENGTH(VEC_LENGTH)
        "fmrx    r0, fpscr                         nt" 
        "bic     r0, r0, #0x00370000               nt" 
        "orr     r0, r0, #0x000" #VEC_LENGTH "0000 nt" 
        "fmxr    fpscr, r0                         nt"
VFP asm
VFP asm
VFP asm
for (int i=0; i<MaxParticles; ++i)
{
    Particle& p = s_particles[i];
    p.x += p.vx*dt;
    p.y += p.vy*dt;
    p.z += p.vz*dt;
    p.vx *= drag;
    p.vy *= drag;
    p.vz *= drag;
}
VFP asm
                            	 	
for (int i=0; i<MaxParticles; ++i)   	 for (int i=0; i<MaxParticles; ++i)
{                           	 	      	 {
    Particle& p = s_particles[i];
                            	 	      	 	 Particle* p = &s_particles[i];
    p.x += p.vx*dt;
    p.y += p.vy*dt;         	 	      	 	 asm volatile (
    p.z += p.vz*dt;         	 	      	 	 	 "fldmias %0, {s0-s5}     nt"
    p.vx *= drag;
                            	 	      	 	 	 "fldmias %1, {s6-s8}     nt"
    p.vy *= drag;
    p.vz *= drag;           	 	      	 	 	 "fldmias %2, {s9-s11}    nt"
}                           	 	      	 	 	 "fmacs s0, s3, s6        nt"
                            	   	    	 	 	 "fmuls s3, s3, s9        nt"
                            	   	    	 	 	 "fstmias %0, {s0-s5}     nt"
                            	   	    	 	 	 : "=r" (p)
                            	   	    	 	 	 : "r" (p), "r" (dtArray),
                                               "r" (dragArray)
                            	 	      	 	 	 :
                            	 	      	 	 );
                            	 	      	 }
VFP asm
                            	 	
for (int i=0; i<MaxParticles; ++i)   	 for (int i=0; i<MaxParticles; ++i)
{                           	 	      	 {
    Particle& p = s_particles[i];
                            	 	      	 	 Particle* p = &s_particles[i];
    p.x += p.vx*dt;
    p.y += p.vy*dt;         	 	      	 	 asm volatile (
    p.z += p.vz*dt;         	 	      	 	 	 "fldmias %0, {s0-s5}     nt"
    p.vx *= drag;
                            	 	      	 	 	 "fldmias %1, {s6-s8}     nt"
    p.vy *= drag;
    p.vz *= drag;           	 	      	 	 	 "fldmias %2, {s9-s11}    nt"
}                           	 	      	 	 	 "fmacs s0, s3, s6        nt"
                            	   	    	 	 	 "fmuls s3, s3, s9        nt"

 Was: 2.6 seconds           	
                            	
                                	
                                	
                                     	
                                     	
                                       	 	 "fstmias %0, {s0-s5}
                                       	 	 : "=r" (p)
                                                                    nt"

                            	   	    	 	 	 : "r" (p), "r" (dtArray),
                                               "r" (dragArray)
                            	 	      	 	 	 :
                            	 	      	 	 );
                            	 	      	 }
VFP asm
                            	 	
for (int i=0; i<MaxParticles; ++i)   	 for (int i=0; i<MaxParticles; ++i)
{                           	 	      	 {
    Particle& p = s_particles[i];
                            	 	      	 	 Particle* p = &s_particles[i];
    p.x += p.vx*dt;
    p.y += p.vy*dt;         	 	      	 	 asm volatile (
    p.z += p.vz*dt;         	 	      	 	 	 "fldmias %0, {s0-s5}     nt"
    p.vx *= drag;
                            	 	      	 	 	 "fldmias %1, {s6-s8}     nt"
    p.vy *= drag;
    p.vz *= drag;           	 	      	 	 	 "fldmias %2, {s9-s11}    nt"
}                           	 	      	 	 	 "fmacs s0, s3, s6        nt"
                            	   	    	 	 	 "fmuls s3, s3, s9        nt"

 Was: 2.6 seconds           	
                            	
                                	
                                	
                                     	
                                     	
                                       	 	 "fstmias %0, {s0-s5}
                                       	 	 : "=r" (p)
                                                                    nt"

                            	   	    	 	 	 : "r" (p), "r" (dtArray),
 Now: 1.4 seconds!!                            "r" (dragArray)
                            	 	      	 	 	 :
                            	 	      	 	 );
                            	 	      	 }
VFP asm
Let’s do 6 operations at once!

    	   struct Particle2
    	   {
    	   	 float x0, y0, z0;
    	   	 float x1, y1, z1;
    	   	 float vx0, vy0, vz0;
    	   	 float vx1, vy1, vz1;
    	   };
VFP asm
	   	   	   for (int i=0; i<iterations; ++i)
	   	   	   {
	   	   	   	 Particle2* p = &s_particles2[i];
	   	   	   	 asm volatile (
	   	   	   	 	 "fldmias %0, {s0-s11}    nt"
	   	   	   	 	 "fldmias %1, {s12-s17} nt"
	   	   	   	 	 "fldmias %2, {s18-s23} nt"
	   	   	   	 	 "fmacs s0, s6, s12        nt"
	   	   	   	 	 "fmuls s6, s6, s18        nt"
	   	   	   	 	 "fstmias %0, {s0-s11}     nt"
	   	   	   	 	 : "=r" (p)
	   	   	   	 	 : "r" (p), "r" (dtArray), "r" (dragArray)
	   	   	   	 	 :
	   	   	   	 );
	   	   	   }
VFP asm
	   	   	   for (int i=0; i<iterations; ++i)
	   	   	   {
	   	   	   	 Particle2* p = &s_particles2[i];
	   	   	   	 asm volatile (
	   	   	   	 	 "fldmias %0, {s0-s11}    nt"
	   	   	   	 	 "fldmias %1, {s12-s17} nt"
	   	   	   	 	 "fldmias %2, {s18-s23} nt"
	   	   	   	 	 "fmacs s0, s6, s12        nt"
	   	   	   	 	 "fmuls s6, s6, s18        nt"
	   	   	   	 	 "fstmias %0, {s0-s11}     nt"
	   	   	   	 	 : "=r" (p)
	   	   	   	 	 : "r" (p), "r" (dtArray), "r" (dragArray)
	   	   	   	 	 :
	   	   	   	 );
	   	   	   }            Was: 1.4 seconds
VFP asm
	   	   	   for (int i=0; i<iterations; ++i)
	   	   	   {
	   	   	   	 Particle2* p = &s_particles2[i];
	   	   	   	 asm volatile (
	   	   	   	 	 "fldmias %0, {s0-s11}    nt"
	   	   	   	 	 "fldmias %1, {s12-s17} nt"
	   	   	   	 	 "fldmias %2, {s18-s23} nt"
	   	   	   	 	 "fmacs s0, s6, s12        nt"
	   	   	   	 	 "fmuls s6, s6, s18        nt"
	   	   	   	 	 "fstmias %0, {s0-s11}     nt"
	   	   	   	 	 : "=r" (p)
	   	   	   	 	 : "r" (p), "r" (dtArray), "r" (dragArray)
	   	   	   	 	 :
	   	   	   	 );
	   	   	   }            Was: 1.4 seconds
                         Now: 1.2 seconds
VFP asm
    What’s the loop/cache overhead?
	   	   	   for (int i=0; i<MaxParticles; ++i)
	   	   	   {
	   	   	   	 Particle* p = &s_particles[i];
	   	   	   	 p->x = p->vx;
	   	   	   	 p->y = p->vy;
	   	   	   	 p->z = p->vz;
	   	   	   }
VFP asm
    What’s the loop/cache overhead?
	   	   	   for (int i=0; i<MaxParticles; ++i)
	   	   	   {
	   	   	   	 Particle* p = &s_particles[i];
	   	   	   	 p->x = p->vx;
	   	   	   	 p->y = p->vy;
	   	   	   	 p->z = p->vz;
	   	   	   }



             Was: 1.2 seconds
VFP asm
    What’s the loop/cache overhead?
	   	   	   for (int i=0; i<MaxParticles; ++i)
	   	   	   {
	   	   	   	 Particle* p = &s_particles[i];
	   	   	   	 p->x = p->vx;
	   	   	   	 p->y = p->vy;
	   	   	   	 p->z = p->vz;
	   	   	   }



             Was: 1.2 seconds
             Now: 1.2 seconds!!!!
Matrix multiply
Matrix multiply
Straight from vfpmathlib
Matrix multiply
Straight from vfpmathlib

Touch: 0.037919 s
Matrix multiply
Straight from vfpmathlib

Touch: 0.037919 s
Normal: 0.096855 s
Matrix multiply
Straight from vfpmathlib

Touch: 0.037919 s
Normal: 0.096855 s
VFP: 0.042216 s
Matrix multiply
Straight from vfpmathlib

Touch: 0.037919 s
Normal: 0.096855 s
VFP: 0.042216 s

     About 2x faster!
Good use of vfp
Good use of vfp
• Matrix operations
Good use of vfp
• Matrix operations
• Particle systems
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
• Procedural content generation
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
• Procedural content generation
• ....
What about the 3GS?
What about the 3GS?
          3G    3GS
 Thumb    7.2   8.0

 Normal   2.6   2.6

  VFP1    1.4   1.30

  VFP2    1.2   0.64

 Touch    1.2   0.18
What about the 3GS?
          3G    3GS
 Thumb    7.2   8.0

 Normal   2.6   2.6

  VFP1    1.4   1.30

  VFP2    1.2   0.64

 Touch    1.2   0.18
What about the 3GS?
          3G    3GS
 Thumb    7.2   8.0

 Normal   2.6   2.6

  VFP1    1.4   1.30

  VFP2    1.2   0.64

 Touch    1.2   0.18
What about the 3GS?
          3G    3GS
 Thumb    7.2   8.0

 Normal   2.6   2.6

  VFP1    1.4   1.30

  VFP2    1.2   0.64

 Touch    1.2   0.18
What about the 3GS?
          3G    3GS
 Thumb    7.2   8.0

 Normal   2.6   2.6

  VFP1    1.4   1.30

  VFP2    1.2   0.64

 Touch    1.2   0.18
What about the 3GS?
          3G    3GS
 Thumb    7.2   8.0

 Normal   2.6   2.6

  VFP1    1.4   1.30

  VFP2    1.2   0.64

 Touch    1.2   0.18
What about the 3GS?
          3G    3GS
 Thumb    7.2   8.0

 Normal   2.6   2.6

  VFP1    1.4   1.30

  VFP2    1.2   0.64

 Touch    1.2   0.18
More 3GS: NEON
More 3GS: NEON

• SIMD coprocessor
More 3GS: NEON

• SIMD coprocessor
• Floating point and integer
More 3GS: NEON

• SIMD coprocessor
• Floating point and integer
• Huge potential
More 3GS: NEON

• SIMD coprocessor
• Floating point and integer
• Huge potential
• Very little documentation right now :-(
Thank you!


         Noel Llopis
        Snappy Touch

http://twitter.com/snappytouch
   noel@snappytouch.com
 http://gamesfromwithin.com

Contenu connexe

En vedette

Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorAneesh Raveendran
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsLinaro
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Snapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSnapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSantosh Verma
 
Qualcomm SnapDragon 800 Mobile Device
Qualcomm SnapDragon 800 Mobile DeviceQualcomm SnapDragon 800 Mobile Device
Qualcomm SnapDragon 800 Mobile DeviceJJ Wu
 
OpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons LearnedOpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons LearnedYury Gorbachev
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - ReviewAbdelrahman Hosny
 
12 Cooling Load Calculations
12 Cooling Load Calculations12 Cooling Load Calculations
12 Cooling Load Calculationsspsu
 

En vedette (15)

64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
 
ARM cortex A15
ARM cortex A15ARM cortex A15
ARM cortex A15
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON Intrinsics
 
Imaging on embedded GPUs
Imaging on embedded GPUsImaging on embedded GPUs
Imaging on embedded GPUs
 
Android Optimization: Myth and Reality
Android Optimization: Myth and RealityAndroid Optimization: Myth and Reality
Android Optimization: Myth and Reality
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Snapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSnapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 Architecture
 
Qualcomm SnapDragon 800 Mobile Device
Qualcomm SnapDragon 800 Mobile DeviceQualcomm SnapDragon 800 Mobile Device
Qualcomm SnapDragon 800 Mobile Device
 
Snapdragon Processor
Snapdragon ProcessorSnapdragon Processor
Snapdragon Processor
 
OpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons LearnedOpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons Learned
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
 
12 Cooling Load Calculations
12 Cooling Load Calculations12 Cooling Load Calculations
12 Cooling Load Calculations
 

Similaire à Cranking Floating Point Performance Up To 11

Mediump support in Mesa (XDC 2019)
Mediump support in Mesa (XDC 2019)Mediump support in Mesa (XDC 2019)
Mediump support in Mesa (XDC 2019)Igalia
 
Know your platform. 7 things every scala developer should know about jvm
Know your platform. 7 things every scala developer should know about jvmKnow your platform. 7 things every scala developer should know about jvm
Know your platform. 7 things every scala developer should know about jvmPawel Szulc
 
Arduino programming of ML-style in ATS
Arduino programming of ML-style in ATSArduino programming of ML-style in ATS
Arduino programming of ML-style in ATSKiwamu Okabe
 
Hands-on VeriFast with STM32 microcontroller @ Osaka
Hands-on VeriFast with STM32 microcontroller @ OsakaHands-on VeriFast with STM32 microcontroller @ Osaka
Hands-on VeriFast with STM32 microcontroller @ OsakaKiwamu Okabe
 
Code and memory optimization tricks
Code and memory optimization tricksCode and memory optimization tricks
Code and memory optimization tricksDevGAMM Conference
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Sperasoft
 
Domo Arigato, Mr(uby) Roboto
Domo Arigato, Mr(uby) RobotoDomo Arigato, Mr(uby) Roboto
Domo Arigato, Mr(uby) Robotoyamanekko
 
Memory Management with Java and C++
Memory Management with Java and C++Memory Management with Java and C++
Memory Management with Java and C++Mohammad Shaker
 
Windows debugging sisimon
Windows debugging   sisimonWindows debugging   sisimon
Windows debugging sisimonSisimon Soman
 
Hands-on VeriFast with STM32 microcontroller @ Nagoya
Hands-on VeriFast with STM32 microcontroller @ NagoyaHands-on VeriFast with STM32 microcontroller @ Nagoya
Hands-on VeriFast with STM32 microcontroller @ NagoyaKiwamu Okabe
 
Web development basics (Part-3)
Web development basics (Part-3)Web development basics (Part-3)
Web development basics (Part-3)Rajat Pratap Singh
 
ROP 輕鬆談
ROP 輕鬆談ROP 輕鬆談
ROP 輕鬆談hackstuff
 
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...frank2
 
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)Miguel Arroyo
 
Node.js - Advanced Basics
Node.js - Advanced BasicsNode.js - Advanced Basics
Node.js - Advanced BasicsDoug Jones
 
RubyConf Portugal 2014 - Why ruby must go!
RubyConf Portugal 2014 - Why ruby must go!RubyConf Portugal 2014 - Why ruby must go!
RubyConf Portugal 2014 - Why ruby must go!Gautam Rege
 
Vim Script Programming
Vim Script ProgrammingVim Script Programming
Vim Script ProgrammingLin Yo-An
 

Similaire à Cranking Floating Point Performance Up To 11 (20)

Mediump support in Mesa (XDC 2019)
Mediump support in Mesa (XDC 2019)Mediump support in Mesa (XDC 2019)
Mediump support in Mesa (XDC 2019)
 
Know your platform. 7 things every scala developer should know about jvm
Know your platform. 7 things every scala developer should know about jvmKnow your platform. 7 things every scala developer should know about jvm
Know your platform. 7 things every scala developer should know about jvm
 
Arduino programming of ML-style in ATS
Arduino programming of ML-style in ATSArduino programming of ML-style in ATS
Arduino programming of ML-style in ATS
 
Hands-on VeriFast with STM32 microcontroller @ Osaka
Hands-on VeriFast with STM32 microcontroller @ OsakaHands-on VeriFast with STM32 microcontroller @ Osaka
Hands-on VeriFast with STM32 microcontroller @ Osaka
 
Code and memory optimization tricks
Code and memory optimization tricksCode and memory optimization tricks
Code and memory optimization tricks
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks
 
Domo Arigato, Mr(uby) Roboto
Domo Arigato, Mr(uby) RobotoDomo Arigato, Mr(uby) Roboto
Domo Arigato, Mr(uby) Roboto
 
Memory Management with Java and C++
Memory Management with Java and C++Memory Management with Java and C++
Memory Management with Java and C++
 
Windows debugging sisimon
Windows debugging   sisimonWindows debugging   sisimon
Windows debugging sisimon
 
Hands-on VeriFast with STM32 microcontroller @ Nagoya
Hands-on VeriFast with STM32 microcontroller @ NagoyaHands-on VeriFast with STM32 microcontroller @ Nagoya
Hands-on VeriFast with STM32 microcontroller @ Nagoya
 
Vim Hacks
Vim HacksVim Hacks
Vim Hacks
 
Vim Hacks
Vim HacksVim Hacks
Vim Hacks
 
Inside Winnyp
Inside WinnypInside Winnyp
Inside Winnyp
 
Web development basics (Part-3)
Web development basics (Part-3)Web development basics (Part-3)
Web development basics (Part-3)
 
ROP 輕鬆談
ROP 輕鬆談ROP 輕鬆談
ROP 輕鬆談
 
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
 
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
 
Node.js - Advanced Basics
Node.js - Advanced BasicsNode.js - Advanced Basics
Node.js - Advanced Basics
 
RubyConf Portugal 2014 - Why ruby must go!
RubyConf Portugal 2014 - Why ruby must go!RubyConf Portugal 2014 - Why ruby must go!
RubyConf Portugal 2014 - Why ruby must go!
 
Vim Script Programming
Vim Script ProgrammingVim Script Programming
Vim Script Programming
 

Plus de John Wilker

Introtoduction to cocos2d
Introtoduction to  cocos2dIntrotoduction to  cocos2d
Introtoduction to cocos2dJohn Wilker
 
Getting Started with OpenGL ES
Getting Started with OpenGL ESGetting Started with OpenGL ES
Getting Started with OpenGL ESJohn Wilker
 
User Input in a multi-touch, accelerometer, location aware world.
User Input in a multi-touch, accelerometer, location aware world.User Input in a multi-touch, accelerometer, location aware world.
User Input in a multi-touch, accelerometer, location aware world.John Wilker
 
Physics Solutions for Innovative Game Design
Physics Solutions for Innovative Game DesignPhysics Solutions for Innovative Game Design
Physics Solutions for Innovative Game DesignJohn Wilker
 
Getting Oriented with MapKit: Everything you need to get started with the new...
Getting Oriented with MapKit: Everything you need to get started with the new...Getting Oriented with MapKit: Everything you need to get started with the new...
Getting Oriented with MapKit: Everything you need to get started with the new...John Wilker
 
Getting Started with iPhone Game Development
Getting Started with iPhone Game DevelopmentGetting Started with iPhone Game Development
Getting Started with iPhone Game DevelopmentJohn Wilker
 
Internationalizing Your Apps
Internationalizing Your AppsInternationalizing Your Apps
Internationalizing Your AppsJohn Wilker
 
Optimizing Data Caching for iPhone Application Responsiveness
Optimizing Data Caching for iPhone Application ResponsivenessOptimizing Data Caching for iPhone Application Responsiveness
Optimizing Data Caching for iPhone Application ResponsivenessJohn Wilker
 
I Phone On Rails
I Phone On RailsI Phone On Rails
I Phone On RailsJohn Wilker
 
Integrating Push Notifications in your iPhone application with iLime
Integrating Push Notifications in your iPhone application with iLimeIntegrating Push Notifications in your iPhone application with iLime
Integrating Push Notifications in your iPhone application with iLimeJohn Wilker
 
Starting Core Animation
Starting Core AnimationStarting Core Animation
Starting Core AnimationJohn Wilker
 
P2P Multiplayer Gaming
P2P Multiplayer GamingP2P Multiplayer Gaming
P2P Multiplayer GamingJohn Wilker
 
Using Concurrency To Improve Responsiveness
Using Concurrency To Improve ResponsivenessUsing Concurrency To Improve Responsiveness
Using Concurrency To Improve ResponsivenessJohn Wilker
 
Leaving Interface Builder Behind
Leaving Interface Builder BehindLeaving Interface Builder Behind
Leaving Interface Builder BehindJohn Wilker
 
Mobile WebKit Development and jQTouch
Mobile WebKit Development and jQTouchMobile WebKit Development and jQTouch
Mobile WebKit Development and jQTouchJohn Wilker
 
Accelerometer and OpenGL
Accelerometer and OpenGLAccelerometer and OpenGL
Accelerometer and OpenGLJohn Wilker
 
Deep Geek Diving into the iPhone OS and Framework
Deep Geek Diving into the iPhone OS and FrameworkDeep Geek Diving into the iPhone OS and Framework
Deep Geek Diving into the iPhone OS and FrameworkJohn Wilker
 
NSNotificationCenter vs. AppDelegate
NSNotificationCenter vs. AppDelegateNSNotificationCenter vs. AppDelegate
NSNotificationCenter vs. AppDelegateJohn Wilker
 
From Flash to iPhone
From Flash to iPhoneFrom Flash to iPhone
From Flash to iPhoneJohn Wilker
 

Plus de John Wilker (20)

Introtoduction to cocos2d
Introtoduction to  cocos2dIntrotoduction to  cocos2d
Introtoduction to cocos2d
 
Getting Started with OpenGL ES
Getting Started with OpenGL ESGetting Started with OpenGL ES
Getting Started with OpenGL ES
 
User Input in a multi-touch, accelerometer, location aware world.
User Input in a multi-touch, accelerometer, location aware world.User Input in a multi-touch, accelerometer, location aware world.
User Input in a multi-touch, accelerometer, location aware world.
 
Physics Solutions for Innovative Game Design
Physics Solutions for Innovative Game DesignPhysics Solutions for Innovative Game Design
Physics Solutions for Innovative Game Design
 
Getting Oriented with MapKit: Everything you need to get started with the new...
Getting Oriented with MapKit: Everything you need to get started with the new...Getting Oriented with MapKit: Everything you need to get started with the new...
Getting Oriented with MapKit: Everything you need to get started with the new...
 
Getting Started with iPhone Game Development
Getting Started with iPhone Game DevelopmentGetting Started with iPhone Game Development
Getting Started with iPhone Game Development
 
Internationalizing Your Apps
Internationalizing Your AppsInternationalizing Your Apps
Internationalizing Your Apps
 
Optimizing Data Caching for iPhone Application Responsiveness
Optimizing Data Caching for iPhone Application ResponsivenessOptimizing Data Caching for iPhone Application Responsiveness
Optimizing Data Caching for iPhone Application Responsiveness
 
I Phone On Rails
I Phone On RailsI Phone On Rails
I Phone On Rails
 
Integrating Push Notifications in your iPhone application with iLime
Integrating Push Notifications in your iPhone application with iLimeIntegrating Push Notifications in your iPhone application with iLime
Integrating Push Notifications in your iPhone application with iLime
 
Starting Core Animation
Starting Core AnimationStarting Core Animation
Starting Core Animation
 
P2P Multiplayer Gaming
P2P Multiplayer GamingP2P Multiplayer Gaming
P2P Multiplayer Gaming
 
Using Concurrency To Improve Responsiveness
Using Concurrency To Improve ResponsivenessUsing Concurrency To Improve Responsiveness
Using Concurrency To Improve Responsiveness
 
Leaving Interface Builder Behind
Leaving Interface Builder BehindLeaving Interface Builder Behind
Leaving Interface Builder Behind
 
Mobile WebKit Development and jQTouch
Mobile WebKit Development and jQTouchMobile WebKit Development and jQTouch
Mobile WebKit Development and jQTouch
 
Accelerometer and OpenGL
Accelerometer and OpenGLAccelerometer and OpenGL
Accelerometer and OpenGL
 
Deep Geek Diving into the iPhone OS and Framework
Deep Geek Diving into the iPhone OS and FrameworkDeep Geek Diving into the iPhone OS and Framework
Deep Geek Diving into the iPhone OS and Framework
 
NSNotificationCenter vs. AppDelegate
NSNotificationCenter vs. AppDelegateNSNotificationCenter vs. AppDelegate
NSNotificationCenter vs. AppDelegate
 
Using SQLite
Using SQLiteUsing SQLite
Using SQLite
 
From Flash to iPhone
From Flash to iPhoneFrom Flash to iPhone
From Flash to iPhone
 

Dernier

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Dernier (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Cranking Floating Point Performance Up To 11

  • 1. Cranking Floating Point Performance Up To 11  Noel Llopis Snappy Touch http://twitter.com/snappytouch noel@snappytouch.com http://gamesfromwithin.com
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 9. Floating point numbers • Representation of rational numbers
  • 10. Floating point numbers • Representation of rational numbers • 1.2345, -0.8374, 2.0000, 14388439.34, etc
  • 11. Floating point numbers • Representation of rational numbers • 1.2345, -0.8374, 2.0000, 14388439.34, etc • Following IEEE 754 format
  • 12. Floating point numbers • Representation of rational numbers • 1.2345, -0.8374, 2.0000, 14388439.34, etc • Following IEEE 754 format • Single precision: 32 bits
  • 13. Floating point numbers • Representation of rational numbers • 1.2345, -0.8374, 2.0000, 14388439.34, etc • Following IEEE 754 format • Single precision: 32 bits • Double precision: 64 bits
  • 16. Why floating point performance?
  • 17. Why floating point performance? • Most games use floating point numbers for most of their calculations
  • 18. Why floating point performance? • Most games use floating point numbers for most of their calculations • Positions, velocities, physics, etc, etc.
  • 19. Why floating point performance? • Most games use floating point numbers for most of their calculations • Positions, velocities, physics, etc, etc. • Maybe not so much for regular apps
  • 20. CPU
  • 22. CPU • 32-bit RISC ARM 11 • 400-535Mhz
  • 23. CPU • 32-bit RISC ARM 11 • 400-535Mhz • iPhone 2G/3G and iPod Touch 1st and 2nd gen
  • 25. CPU (iPhone 3GS) • Cortex-A8 600MHz
  • 26. CPU (iPhone 3GS) • Cortex-A8 600MHz • More advanced architecture
  • 27. CPU
  • 28. CPU • No floating point support in the ARM CPU!!!
  • 30. How about integer math? • No need to do any floating point operations
  • 31. How about integer math? • No need to do any floating point operations • Fully supported in the ARM processor
  • 32. How about integer math? • No need to do any floating point operations • Fully supported in the ARM processor • But...
  • 35. Integer Divide There is no integer divide
  • 37. Fixed-point arithmetic • Sometimes integer arithmetic doesn’t cut it
  • 38. Fixed-point arithmetic • Sometimes integer arithmetic doesn’t cut it • You need to represent rational numbers
  • 39. Fixed-point arithmetic • Sometimes integer arithmetic doesn’t cut it • You need to represent rational numbers • Can use a fixed-point library.
  • 40. Fixed-point arithmetic • Sometimes integer arithmetic doesn’t cut it • You need to represent rational numbers • Can use a fixed-point library. • Performs rational arithmetic with integer values at a reduced range/resolution.
  • 41. Fixed-point arithmetic • Sometimes integer arithmetic doesn’t cut it • You need to represent rational numbers • Can use a fixed-point library. • Performs rational arithmetic with integer values at a reduced range/resolution. • Not so great...
  • 43. Floating point support • There’s a floating point unit
  • 44. Floating point support • There’s a floating point unit • Compiled C/C++/ObjC code uses the VFP unit for any floating point operations.
  • 46. Sample program struct Particle { float x, y, z; float vx, vy, vz; };
  • 47. Sample program struct Particle for (int i=0; i<MaxParticles; ++i) { { float x, y, z; Particle& p = s_particles[i]; float vx, vy, vz; p.x += p.vx*dt; }; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag; }
  • 48. Sample program struct Particle for (int i=0; i<MaxParticles; ++i) { { float x, y, z; Particle& p = s_particles[i]; float vx, vy, vz; p.x += p.vx*dt; }; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag; } • 7.2 seconds on an iPod Touch 2nd gen
  • 50. Floating point support Trust no one!
  • 51. Floating point support Trust no one! When in doubt, check the assembly generated
  • 55. Thumb Mode • CPU has a special thumb mode.
  • 56. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance.
  • 57. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance. • No floating point support.
  • 58. Thumb Mode • CPU has a special thumb mode. • Less memory, maybe better performance. • No floating point support. • Every time there’s an fp operation, it switches out of Thumb, does the fp operation, and switches back on.
  • 60. Thumb Mode • It’s on by default!
  • 61. Thumb Mode • It’s on by default! • Potentiallyoff. wins turning it HUGE
  • 62. Thumb Mode • It’s on by default! • Potentiallyoff. wins turning it HUGE
  • 64. Thumb Mode • Turning off Thumb mode increased performance in Flower Garden by over 2x
  • 65. Thumb Mode • Turning off Thumb mode increased performance in Flower Garden by over 2x • Heavy usage of floating point operations though
  • 66. Thumb Mode • Turning off Thumb mode increased performance in Flower Garden by over 2x • Heavy usage of floating point operations though • Most games will probably benefit from turning it off (especially 3D games)
  • 67.
  • 69. ARM assembly DISCLAIMER:
  • 70. ARM assembly DISCLAIMER: I’m not an ARM assembly expert!!!
  • 71. ARM assembly DISCLAIMER: I’m not an ARM assembly expert!!!
  • 72. ARM assembly DISCLAIMER: I’m not an ARM assembly expert!!!
  • 73. ARM assembly DISCLAIMER: I’m not an ARM assembly expert!!! Z80!!!
  • 76. ARM assembly • Hit the docs • References included in your USB card
  • 77. ARM assembly • Hit the docs • References included in your USB card • Or download them from the ARM site
  • 78. ARM assembly • Hit the docs • References included in your USB card • Or download them from the ARM site • http://bit.ly/arminfo
  • 80. ARM assembly • Reading assembly is a very important skill for high-performance programming
  • 81. ARM assembly • Reading assembly is a very important skill for high-performance programming • Writing is more specialized. Most people don’t need to.
  • 89. VFP unit A0 A2 + + B0 B2 = = C0 C2 A1 + B1 = C1
  • 90. VFP unit A0 A2 + + B0 B2 = = C0 C2 A1 A3 + + B1 B3 = = C1 C3
  • 92. VFP unit A0 A1 A2 A3
  • 93. VFP unit A0 A1 A2 A3 +
  • 94. VFP unit A0 A1 A2 A3 + B0 B1 B2 B3
  • 95. VFP unit A0 A1 A2 A3 + B0 B1 B2 B3 =
  • 96. VFP unit A0 A1 A2 A3 + B0 B1 B2 B3 = C0 C1 C2 C3
  • 97. VFP unit A0 A1 A2 A3 + B0 B1 B2 B3 = C0 C1 C2 C3 Sweet! How do we use the vfp?
  • 98. Like this! "fldmias %2, {s8-s23} nt" "fldmias %1!, {s0-s3} nt" "fmuls s24, s8, s0 nt" "fmacs s24, s12, s1 nt" "fldmias %1!, {s4-s7} nt" "fmacs s24, s16, s2 nt" "fmacs s24, s20, s3 nt" "fstmias %0!, {s24-s27} nt"
  • 100. Writing vfp assembly • There are two parts to it
  • 101. Writing vfp assembly • There are two parts to it • How to write any assembly in gcc
  • 102. Writing vfp assembly • There are two parts to it • How to write any assembly in gcc • Learning ARM and VPM assembly
  • 104. vfpmath library • Already done a lot of work for you
  • 105. vfpmath library • Already done a lot of work for you • http://code.google.com/p/vfpmathlibrary
  • 106. vfpmath library • Already done a lot of work for you • http://code.google.com/p/vfpmathlibrary • Vector/matrix math
  • 107. vfpmath library • Already done a lot of work for you • http://code.google.com/p/vfpmathlibrary • Vector/matrix math • Might not be exactly what you need, but it’s a great starting point
  • 108. Assembly in gcc • Only use it when targeting the device
  • 109. Assembly in gcc • Only use it when targeting the device #include <TargetConditionals.h> #if (TARGET_IPHONE_SIMULATOR == 0) && (TARGET_OS_IPHONE == 1) #define USE_VFP #endif
  • 110. Assembly in gcc • The basics asm (“cmp r2, r1”);
  • 111. Assembly in gcc • The basics asm (“cmp r2, r1”); http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly- HOWTO.html
  • 112. Assembly in gcc • Multiple lines asm ( “mov r0, #1000nt” “cmp r2, r1nt” );
  • 113. Assembly in gcc • Accessing C variables asm (//assembly code : // output operands : // input operands : // clobbered registers );
  • 114. Assembly in gcc • Accessing C variables asm (//assembly code : // output operands : // input operands : // clobbered registers ); int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );
  • 115. Assembly in gcc • Accessing C variables asm (//assembly code : // output operands : // input operands : // clobbered registers ); int src = 19; int dest = 0; %0, %1, etc are the variables in order asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );
  • 117. Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42nt" "add %0, r10, #33nt" : "=r" (dest) : "r" (src) : "r10" );
  • 118. Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42nt" "add %0, r10, #33nt" : "=r" (dest) : "r" (src) : "r10" ); Clobber register list are registers used by the asm block
  • 119. Assembly in gcc int src = 19; volatile prevents “optimizations” int dest = 0; asm volatile ( "add r10, %1, #42nt" "add %0, r10, #33nt" : "=r" (dest) : "r" (src) : "r10" ); Clobber register list are registers used by the asm block
  • 120. VFP asm Four banks of 8 32-bit registers each
  • 121. VFP asm Four banks of 8 32-bit registers each #define VFP_VECTOR_LENGTH(VEC_LENGTH) "fmrx r0, fpscr nt" "bic r0, r0, #0x00370000 nt" "orr r0, r0, #0x000" #VEC_LENGTH "0000 nt" "fmxr fpscr, r0 nt"
  • 124. VFP asm for (int i=0; i<MaxParticles; ++i) { Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag; }
  • 125. VFP asm for (int i=0; i<MaxParticles; ++i) for (int i=0; i<MaxParticles; ++i) { { Particle& p = s_particles[i]; Particle* p = &s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; asm volatile ( p.z += p.vz*dt; "fldmias %0, {s0-s5} nt" p.vx *= drag; "fldmias %1, {s6-s8} nt" p.vy *= drag; p.vz *= drag; "fldmias %2, {s9-s11} nt" } "fmacs s0, s3, s6 nt" "fmuls s3, s3, s9 nt" "fstmias %0, {s0-s5} nt" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
  • 126. VFP asm for (int i=0; i<MaxParticles; ++i) for (int i=0; i<MaxParticles; ++i) { { Particle& p = s_particles[i]; Particle* p = &s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; asm volatile ( p.z += p.vz*dt; "fldmias %0, {s0-s5} nt" p.vx *= drag; "fldmias %1, {s6-s8} nt" p.vy *= drag; p.vz *= drag; "fldmias %2, {s9-s11} nt" } "fmacs s0, s3, s6 nt" "fmuls s3, s3, s9 nt" Was: 2.6 seconds "fstmias %0, {s0-s5} : "=r" (p) nt" : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
  • 127. VFP asm for (int i=0; i<MaxParticles; ++i) for (int i=0; i<MaxParticles; ++i) { { Particle& p = s_particles[i]; Particle* p = &s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; asm volatile ( p.z += p.vz*dt; "fldmias %0, {s0-s5} nt" p.vx *= drag; "fldmias %1, {s6-s8} nt" p.vy *= drag; p.vz *= drag; "fldmias %2, {s9-s11} nt" } "fmacs s0, s3, s6 nt" "fmuls s3, s3, s9 nt" Was: 2.6 seconds "fstmias %0, {s0-s5} : "=r" (p) nt" : "r" (p), "r" (dtArray), Now: 1.4 seconds!! "r" (dragArray) : ); }
  • 128. VFP asm Let’s do 6 operations at once! struct Particle2 { float x0, y0, z0; float x1, y1, z1; float vx0, vy0, vz0; float vx1, vy1, vz1; };
  • 129. VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} nt" "fldmias %1, {s12-s17} nt" "fldmias %2, {s18-s23} nt" "fmacs s0, s6, s12 nt" "fmuls s6, s6, s18 nt" "fstmias %0, {s0-s11} nt" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
  • 130. VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} nt" "fldmias %1, {s12-s17} nt" "fldmias %2, {s18-s23} nt" "fmacs s0, s6, s12 nt" "fmuls s6, s6, s18 nt" "fstmias %0, {s0-s11} nt" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds
  • 131. VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} nt" "fldmias %1, {s12-s17} nt" "fldmias %2, {s18-s23} nt" "fmacs s0, s6, s12 nt" "fmuls s6, s6, s18 nt" "fstmias %0, {s0-s11} nt" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds Now: 1.2 seconds
  • 132. VFP asm What’s the loop/cache overhead? for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }
  • 133. VFP asm What’s the loop/cache overhead? for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; } Was: 1.2 seconds
  • 134. VFP asm What’s the loop/cache overhead? for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; } Was: 1.2 seconds Now: 1.2 seconds!!!!
  • 135.
  • 138. Matrix multiply Straight from vfpmathlib Touch: 0.037919 s
  • 139. Matrix multiply Straight from vfpmathlib Touch: 0.037919 s Normal: 0.096855 s
  • 140. Matrix multiply Straight from vfpmathlib Touch: 0.037919 s Normal: 0.096855 s VFP: 0.042216 s
  • 141. Matrix multiply Straight from vfpmathlib Touch: 0.037919 s Normal: 0.096855 s VFP: 0.042216 s About 2x faster!
  • 142. Good use of vfp
  • 143. Good use of vfp • Matrix operations
  • 144. Good use of vfp • Matrix operations • Particle systems
  • 145. Good use of vfp • Matrix operations • Particle systems • Skinning
  • 146. Good use of vfp • Matrix operations • Particle systems • Skinning • Physics
  • 147. Good use of vfp • Matrix operations • Particle systems • Skinning • Physics • Procedural content generation
  • 148. Good use of vfp • Matrix operations • Particle systems • Skinning • Physics • Procedural content generation • ....
  • 149. What about the 3GS?
  • 150. What about the 3GS? 3G 3GS Thumb 7.2 8.0 Normal 2.6 2.6 VFP1 1.4 1.30 VFP2 1.2 0.64 Touch 1.2 0.18
  • 151. What about the 3GS? 3G 3GS Thumb 7.2 8.0 Normal 2.6 2.6 VFP1 1.4 1.30 VFP2 1.2 0.64 Touch 1.2 0.18
  • 152. What about the 3GS? 3G 3GS Thumb 7.2 8.0 Normal 2.6 2.6 VFP1 1.4 1.30 VFP2 1.2 0.64 Touch 1.2 0.18
  • 153. What about the 3GS? 3G 3GS Thumb 7.2 8.0 Normal 2.6 2.6 VFP1 1.4 1.30 VFP2 1.2 0.64 Touch 1.2 0.18
  • 154. What about the 3GS? 3G 3GS Thumb 7.2 8.0 Normal 2.6 2.6 VFP1 1.4 1.30 VFP2 1.2 0.64 Touch 1.2 0.18
  • 155. What about the 3GS? 3G 3GS Thumb 7.2 8.0 Normal 2.6 2.6 VFP1 1.4 1.30 VFP2 1.2 0.64 Touch 1.2 0.18
  • 156. What about the 3GS? 3G 3GS Thumb 7.2 8.0 Normal 2.6 2.6 VFP1 1.4 1.30 VFP2 1.2 0.64 Touch 1.2 0.18
  • 158. More 3GS: NEON • SIMD coprocessor
  • 159. More 3GS: NEON • SIMD coprocessor • Floating point and integer
  • 160. More 3GS: NEON • SIMD coprocessor • Floating point and integer • Huge potential
  • 161. More 3GS: NEON • SIMD coprocessor • Floating point and integer • Huge potential • Very little documentation right now :-(
  • 162. Thank you! Noel Llopis Snappy Touch http://twitter.com/snappytouch noel@snappytouch.com http://gamesfromwithin.com