4. BENEFIT OF 64-BIT
• Larger virtual address space (AARCH64 -> 49 bits)
(Wind down overlaying issues of larger programs)
• Wider width memory access bus
(Reduce memory latency)
• Wider width of register files
(Better for 64-bit lengthy arithmetic)
• More register files (Reduce register spilling)
• New instruction set & new wider I/O peripherals
(New feature)
• New marketing momentum
• Opportunities to power consumption reduces
6. DRAWBACK OF 64-BIT
• Larger size of program files
(On disk format)
• Larger size of pointers
(In memory format)
(32-bit -> 64-bit, 4bytes -> 8bytes in size)
• Mode switching overhead
(T32, A32, and A64 execution environment)
• Ecosystem migration & backward compatibility
(Upgrade to 64-bit, compiler, software vendors,
validations, time to market, SoC vendors, …etc.)
10. 64-BIT ELF FILE FORMAT
• ELF for the ARM 64-bit Architecture (AArch64)
(http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056b/IHI0056B_aaelf64.pdf)
• 64-bit header format (EM_AARCH64,
SHT_AARCH64_ATTRIBUTES, …etc.)
• Larger GOT/PLT entries & addresses
• New relocation types for 64-bit
• 64-bit DRAWF format of debugging info.
17. AARCH64 REGISTER FILES
• 64-BIT ARM INTRODUCTION TO PORTING
(http://people.linaro.org/~rikuvoipio/aarch64-talk/)
• SCALAR/SIMD REGISTERS
32 bit float registers: S0 ... S31
64 bit double registers: D0 ... D31
128 bit SIMD registers: V0 ... V31
SIMD and Scalar share register bank
S0 is bottom 32 bits of D0 which is the bottom 64 bits of
V0.
• There are 32 S registers and 32 D registers. The S registers
are not packed into D registers, but occupy the low 32
bits of the corresponding D register.
For example S31=D31<31:0>, not D15<63:32>
19. AARCH64 REGISTER FILES
(HIGHLIGHT)
• Zero register (Read from R31)
• Stack pointer (Write to R31)
• PC (Program Counter) is never accessible
• Zero extended to 64-bits in A32 mode
• General purpose registers extended from 15 to 31
• FP registers kept 32 in amount, changed to non-
packed
21. AARCH64 CALLING CONVENTION
• Procedure Call Standard for the ARM 64-bit
Architecture (AArch64)
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
22. AARCH64 CALLING CONVENTION
• Procedure Call Standard for the ARM® Architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
25. AARCH64 RUNTIME ABI (AEABI)
• Run-time ABI for the ARM® Architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf
• C++ Application Binary Interface Standard for the
ARM 64-bit Architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0059b/IHI0059B_cppabi64.pdf
• Floating-point library (Removed, Built-in vFP)
• Long long helper functions (Removed, native 64-bit)
• Other C and assembly lang. helper functions (Kept)
• C++ helper function (Kept, Reinforced C++ ABI)
30. AARCH64 INSTRUCTIONS
(CONDITIONAL INSTRUCTIONS)
• Modern branch predictors work well enough
• In order to justify OPCODE space and impl. COST
• Only a very small set of “conditional data
processing” instr. are provided
1. Conditional branch
2. Add/substract
3. Conditional select with increment, negate or
invert (Select (move) or Set)
4. Conditional compare
31. AARCH64 INSTRUCTIONS
(ADDRESSING FEATURES)
• Register indexed addressing
Extended T32 addressing modes, allowing 64-bit
index and base registers to obtain addresses
• PC-relative addressing
PC-relative literal loads (+- 1MB)
Most conditional branches (+- 1MB)
Unconditional branches (+- 128MB)
PC-relative load/store by only 2 instructions (+- 4GB)
32. AARCH64 INSTRUCTIONS
(THE PROGRAM COUNTER)
• In AARCH32, R15 = PC, writing to R15 means
change the program counter
• In AARCH64, R15 != PC, PC can be changed by
neither writing values to R15 nor other instructions
• In AARCH64, PC can only be read by computing a
PC-relative address (ADR, ADRP, literal load, and
direct branch), and branch-and-link instructions (BL
and BLR)
• In AARCH64, PC can only be written by
conditional/unconditional branches and exception
handle/return
33. AARCH64 INSTRUCTIONS
(MEMORY LOAD-STORE)
• Bulk transfers
1. LDM, STM, PUSH, and POP removed
2. LDP and STP added (Paired dest. registers)
3. LDNP and STNP added (streaming and non-
temporal)
4. PRFM (prefetch memory) added
• Exclusive accesses (atomic operations)
• Load-acquire, Store-release
(Release-consistency, RCsc), reducing the need for
explicit memory barriers
36. SUMMARY OF AARCH64
• New instruction set (decoding) & 32-bit fixed length
• Larger number of register files (31GPs, 32FPs)
• 64-bit pointer and integral registers
• Interoperability of AARCH32 (T32/A32) & AARCH64
• Mandate vFP and Advanced SIMD (built-in)
• LDM/STM removed, LDP/STP added
• Conditional instructions are reduced, few left
• PC-relative addressing
• Memory ordering (new LDRA/STRL, Load-
Acquire/Store-Release)
38. STATE OF 64-BIT ANDROID
• ARM64 CPUs, SoCs & Reference Designs
1. Samsung Exynos 5433 (Samsung Galaxy Note 4)
2. Qualcomm Snapdragon 8916 (Next upcoming)
3. nVidia Tegra K1(N9)
• X86_64 CPUs, SoCs & Reference Designs
1. Intel Baytrail-T ATOM SoC
2. Intel Moorefield ATOM SoC
• ARM64 Compiler
GCC (Ready, by Linaro and communities)
LLVM (Ready, by Apple for iOS development)
• X86_64 Compiler
GCC and LLVM (Ready)
39. STATE OF 64-BIT ANDROID
• ELF64 Format for ARM64 (On Disk)
ARM64 (Ready)
X86_64 (Ready)
• ELF64 Program Loader/Linker (In Memory)
ARM64 GNU Linker (Ready, by Linaro)
x86_64 GNU Linker (Ready)
• 64-bit Calling Convention
ARM64 (Ready)
X86_64 (Ready)
• 64-bit Linux Kernel & ABI
ARM64 (Ready, by Linaro)
X86_64 (Ready)