SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Linaro
Connect,
Hong Kong
March 2013
LCE13, Dublin. 8-12 July 2013
Porting - ARMV8 Neon
connect.linaro.org

ARMV8 Neon

Vector operations on ARMV8

Scalar operation on ARMV8

Neon load/store instructions

Load/Store illustration

Widening/Narrow Arithemetic

Vector Reduction

ARMV7 vs ARMV8 Neon Instructions

Armv8 Porting (Jpeg Turbo IDCT code)
Contents
connect.linaro.org
ARMV8 Neon
 Neon is an optional extension to ARMV8
 Neon performs packed SIMD operations on either integer or floating
point (Single/double precision) data.
 Neon extension registers are distinct from ARM core register set and
contains
l
32 X 128 bit wide vector registers
l
32 X 64 bit wide vector registers which are held in lower 64 bit of
each 128 bit registers
connect.linaro.org
Vector Operations (Data type - byte)
– Char array1[8] = {1,2,3,4,5,6,7,8}
– Char array2[8] = {1,2,3,4,5,6,7,8}
– /* Load array1 V1.8B*/
– LD1 {V1.8B},[array1]
– /* Load array2 V2.8B*/
– LD1 {V2.8B},[array2]
– /* operations */
– MUL V1.8B,V1.8B,V2.8B
8 7 6 5 4 3 2 1
8 7 6 5 4 3 2 1 V2.8B
V1.8B
64 49 36 25 16 9 4 1 V1.8B
64 49 36 25 16 9 4 10 0 0 0 0 0 0 0 V1.16B
Vector Operations on ARMV8
connect.linaro.org
 Scalar Operations
l
Some SIMD instruction operate on scalar data and registers
holding these data behaves similar to main general purpose
registers.
l
Suffix n ->{registers from 0 to 31}
l
Unused higher bits are not accessed during read and set to zero
during write.
Scalar Operation on Neon
connect.linaro.org
 LD1,LD2,LD3,LD4 and ST1,ST2,ST3 & ST4 are some of the vector load and
store instructions
 The format of this instruction is -> it can contain 1 ~ 4 consecutive registers
to/from which consecutive 8,16,32,64 bit elements can be transferred.
 LD1 transfers data from memory to consecutive registers without any interleave
l
Load/Store Instruction Neon
connect.linaro.org
 LD2/St2 instruction loads elements to the specified registers/memory in
an interleaved fashion.
 The format of the instruction
ld2 {vn.<t>, vm.<t>}, [x1] t-> 8b,16b, 4h,8h,2s,4s,d,2d
for this particular instruction the rules are the registers should be consecutive
and the number of registers has to be 2 or 4.
l
0x10
0x11
0x10
0x11
0x10
0x11
0x100x10
0x110x11
ld2 {v0.2s,v1.2s}, [x0]
Load/Store Instruction Neon
connect.linaro.org
LD3/St3 instruction loads elements to the specified registers / memory with
2 data element interleaved .
The format of the instruction
ld3 {vi.<t>, vj.<t>,vk}, [x1] t-> 8b,16b, 4h,8h,2s,4s,d,2d
For this particular instruction the rules are registers should be consecutive
and the number of registers has to be 3.
optional space is allowed to support 128bit vector
Load/Store Instruction Neon
connect.linaro.org
 LD4/St4 instruction loads elements to the specified registers/memory with
3 data element interleaved .
 The format of the instruction-> ld3 {vi.<t>, vj.<t>,vk}, [x1] t-> 8b,16b,
4h,8h,2s,4s,d,2d
Instruction rules:
1. the registers should be consecutive and the number of registers has to be
4.
2. optional space is allowed to support 128bit vector
l
Load/Store Instruction Neon
0x10
0x11
0x12
0x13 0x100x10
0x110x11
v0.2s
0x12 0x12
v1.2s
v2.2s
0x10
0x11
0x12
0x13
0x13 0x13 v3.2s
ld4 {v0.2s,v1.2s,v2.2s,v3.2s}, [x0]
connect.linaro.org
Example RGB channel :
Example: Load 12 16 bit values from the address stored in R0, and split them over
64 bit registers V0.8B, V1.8B and V2.8B.
LD3 {v0.8B,V1.16B,V2.16B}, [X0]
l
Load/Store Instruction Neon
connect.linaro.org
Narrow,Long and Wide Instructions support
1. Long Instructions
SADDL Vd.<Td>, Vn.<Ts>, Vm.<Ts>
<Td>/<Ts> is 8H/8B, 4S/4H or 2D/2S.
2. Wide Instruction
SADDW Vd.<Td>, Vn.<Td>, Vm.<Ts>
<Td>/<Ts> is 8H/8B, 4S/4H or 2D/2S
3. Narrow Instruction -> add and narrow higher half
ADDHN Vd.<Td>, Vn.<Ts>, Vm.<Ts>
Narrow/Wide Neon Instructions
connect.linaro.org
Vector Reduce
1. addv <v>d, vn<T> (eg:- addv v15.[5]b, v5.8b )
<v>d is the destination scalar register (b,h,s,d)
Vn<T> is vector register of size T ( 8b/16b, 4h/8h, 2s/4s,d/2d)
2. saddlv <v>d, vn<T> (eg:- addv v15.[5]h, v5.8b )
<v>d is the destination scalar register (h,s,d)
Vn<T> is vector register of size T ( 8b/16b, 4h/8h, 2s/4s,d/2d)
l
Vector Reduction
connect.linaro.org
Difference in the mnemonics
ARMV7 ARMV8
1. vld1.16 {d1,d2,d3,d4}, [r0]
2. vsub.s32 q1,q1,q2
1. ld1 {v2.4h,v3.4h,v4.4h}, [x0]
2. ssub v1.4s,v1.4s,v1.4s
1. ARMV7 mnemonic prefix ‘v’ has been removed and replaced by s/u/f/p
which indicates the data type as signed/unsigned/float/polynomial
2. Vector organization ( element Size/ number of lanes) is actually described by
the register qualifiers and not by the mnemonics as in ARMV7.
3. suffix like V has been added to the mnemonic to indicate it’s a reduction type
operation, suffix 2 indicates the instruction is a widening/narrowing type
operating on the second half of the register.
Scalar Operation on Neon
connect.linaro.org
Armv8 Porting (Jpeg Turbo IDCT code)
1. IDCT ( integer & Fast algorithms) ARMV7 neon code availability in Jpeg Turbo.
2. Some of the instruction of ARMV7 doesnot exist in ARMV8 like vswap, push, pull
and has partial load pair instrucitons.
3. IDCT ARMV7 code takes advantage of complete register overlap of 64bit & 128bit
registers and this can increase the effort of porting.
Other methods of porting if you dont have an ARMv7 code
1. use of Intrinisics can be a good option but bit risky as it doesnot guarntee the
effective neon code generation as intended
2. can seperate the function of interest and generate armv8 neon code through
vectorizaiton option and start optimizing that.
l
Linaro
Connect,
Hong Kong
March 2013
Questions?
More about Linaro: http://www.linaro.org/about/
More about Linaro engineering: http://www.linaro.org/engineering/
How to join: http://www.linaro.org/about/how-to-join
Linaro members: www.linaro.org/members

Contenu connexe

En vedette

LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELinaro
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEELinaro
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Yannick Gicquel
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEELinaro
 
LCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELinaro
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewLinaro
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3Linaro
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingMerck Hung
 
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelBUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelLinaro
 
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMXPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMThe Linux Foundation
 

En vedette (10)

LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEE
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEE
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
 
LCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEE
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting Review
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefing
 
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelBUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
 
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMXPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
 

Plus de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Plus de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Dernier

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

LCE13: Porting NEON from armv7 to armv8, an experience report

  • 1. Linaro Connect, Hong Kong March 2013 LCE13, Dublin. 8-12 July 2013 Porting - ARMV8 Neon
  • 2. connect.linaro.org  ARMV8 Neon  Vector operations on ARMV8  Scalar operation on ARMV8  Neon load/store instructions  Load/Store illustration  Widening/Narrow Arithemetic  Vector Reduction  ARMV7 vs ARMV8 Neon Instructions  Armv8 Porting (Jpeg Turbo IDCT code) Contents
  • 3. connect.linaro.org ARMV8 Neon  Neon is an optional extension to ARMV8  Neon performs packed SIMD operations on either integer or floating point (Single/double precision) data.  Neon extension registers are distinct from ARM core register set and contains l 32 X 128 bit wide vector registers l 32 X 64 bit wide vector registers which are held in lower 64 bit of each 128 bit registers
  • 4. connect.linaro.org Vector Operations (Data type - byte) – Char array1[8] = {1,2,3,4,5,6,7,8} – Char array2[8] = {1,2,3,4,5,6,7,8} – /* Load array1 V1.8B*/ – LD1 {V1.8B},[array1] – /* Load array2 V2.8B*/ – LD1 {V2.8B},[array2] – /* operations */ – MUL V1.8B,V1.8B,V2.8B 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 V2.8B V1.8B 64 49 36 25 16 9 4 1 V1.8B 64 49 36 25 16 9 4 10 0 0 0 0 0 0 0 V1.16B Vector Operations on ARMV8
  • 5. connect.linaro.org  Scalar Operations l Some SIMD instruction operate on scalar data and registers holding these data behaves similar to main general purpose registers. l Suffix n ->{registers from 0 to 31} l Unused higher bits are not accessed during read and set to zero during write. Scalar Operation on Neon
  • 6. connect.linaro.org  LD1,LD2,LD3,LD4 and ST1,ST2,ST3 & ST4 are some of the vector load and store instructions  The format of this instruction is -> it can contain 1 ~ 4 consecutive registers to/from which consecutive 8,16,32,64 bit elements can be transferred.  LD1 transfers data from memory to consecutive registers without any interleave l Load/Store Instruction Neon
  • 7. connect.linaro.org  LD2/St2 instruction loads elements to the specified registers/memory in an interleaved fashion.  The format of the instruction ld2 {vn.<t>, vm.<t>}, [x1] t-> 8b,16b, 4h,8h,2s,4s,d,2d for this particular instruction the rules are the registers should be consecutive and the number of registers has to be 2 or 4. l 0x10 0x11 0x10 0x11 0x10 0x11 0x100x10 0x110x11 ld2 {v0.2s,v1.2s}, [x0] Load/Store Instruction Neon
  • 8. connect.linaro.org LD3/St3 instruction loads elements to the specified registers / memory with 2 data element interleaved . The format of the instruction ld3 {vi.<t>, vj.<t>,vk}, [x1] t-> 8b,16b, 4h,8h,2s,4s,d,2d For this particular instruction the rules are registers should be consecutive and the number of registers has to be 3. optional space is allowed to support 128bit vector Load/Store Instruction Neon
  • 9. connect.linaro.org  LD4/St4 instruction loads elements to the specified registers/memory with 3 data element interleaved .  The format of the instruction-> ld3 {vi.<t>, vj.<t>,vk}, [x1] t-> 8b,16b, 4h,8h,2s,4s,d,2d Instruction rules: 1. the registers should be consecutive and the number of registers has to be 4. 2. optional space is allowed to support 128bit vector l Load/Store Instruction Neon 0x10 0x11 0x12 0x13 0x100x10 0x110x11 v0.2s 0x12 0x12 v1.2s v2.2s 0x10 0x11 0x12 0x13 0x13 0x13 v3.2s ld4 {v0.2s,v1.2s,v2.2s,v3.2s}, [x0]
  • 10. connect.linaro.org Example RGB channel : Example: Load 12 16 bit values from the address stored in R0, and split them over 64 bit registers V0.8B, V1.8B and V2.8B. LD3 {v0.8B,V1.16B,V2.16B}, [X0] l Load/Store Instruction Neon
  • 11. connect.linaro.org Narrow,Long and Wide Instructions support 1. Long Instructions SADDL Vd.<Td>, Vn.<Ts>, Vm.<Ts> <Td>/<Ts> is 8H/8B, 4S/4H or 2D/2S. 2. Wide Instruction SADDW Vd.<Td>, Vn.<Td>, Vm.<Ts> <Td>/<Ts> is 8H/8B, 4S/4H or 2D/2S 3. Narrow Instruction -> add and narrow higher half ADDHN Vd.<Td>, Vn.<Ts>, Vm.<Ts> Narrow/Wide Neon Instructions
  • 12. connect.linaro.org Vector Reduce 1. addv <v>d, vn<T> (eg:- addv v15.[5]b, v5.8b ) <v>d is the destination scalar register (b,h,s,d) Vn<T> is vector register of size T ( 8b/16b, 4h/8h, 2s/4s,d/2d) 2. saddlv <v>d, vn<T> (eg:- addv v15.[5]h, v5.8b ) <v>d is the destination scalar register (h,s,d) Vn<T> is vector register of size T ( 8b/16b, 4h/8h, 2s/4s,d/2d) l Vector Reduction
  • 13. connect.linaro.org Difference in the mnemonics ARMV7 ARMV8 1. vld1.16 {d1,d2,d3,d4}, [r0] 2. vsub.s32 q1,q1,q2 1. ld1 {v2.4h,v3.4h,v4.4h}, [x0] 2. ssub v1.4s,v1.4s,v1.4s 1. ARMV7 mnemonic prefix ‘v’ has been removed and replaced by s/u/f/p which indicates the data type as signed/unsigned/float/polynomial 2. Vector organization ( element Size/ number of lanes) is actually described by the register qualifiers and not by the mnemonics as in ARMV7. 3. suffix like V has been added to the mnemonic to indicate it’s a reduction type operation, suffix 2 indicates the instruction is a widening/narrowing type operating on the second half of the register. Scalar Operation on Neon
  • 14. connect.linaro.org Armv8 Porting (Jpeg Turbo IDCT code) 1. IDCT ( integer & Fast algorithms) ARMV7 neon code availability in Jpeg Turbo. 2. Some of the instruction of ARMV7 doesnot exist in ARMV8 like vswap, push, pull and has partial load pair instrucitons. 3. IDCT ARMV7 code takes advantage of complete register overlap of 64bit & 128bit registers and this can increase the effort of porting. Other methods of porting if you dont have an ARMv7 code 1. use of Intrinisics can be a good option but bit risky as it doesnot guarntee the effective neon code generation as intended 2. can seperate the function of interest and generate armv8 neon code through vectorizaiton option and start optimizing that. l
  • 16. More about Linaro: http://www.linaro.org/about/ More about Linaro engineering: http://www.linaro.org/engineering/ How to join: http://www.linaro.org/about/how-to-join Linaro members: www.linaro.org/members