SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
1
Using GCC
Auto-Vectorizer
Ira Rosen <ira.rosen@linaro.org>
Michael Hope <michael.hope@linaro.org>
r1
bzr branch lp:~michaelh1/+junk/using-the-vectorizer
2
Using GCC Vectorizer
● Vectorization is enabled by the flag -ftree-vectorize and by
default at -O3:
● gcc –O2 –ftree-vectorize myloop.c
● or gcc –O3 myloop.c
● To enable NEON:
-mfpu=neon -mfloat-abi=softfp or -mfloat-abi=hard
● Information on which loops got vectorized, and which
didn’t and why:
● -fdump-tree-vect(-details)
– dumps information into myloop.c.##t.vect
● -ftree-vectorizer-verbose=[X]
– dumps to stderr
● More information:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
3
Other useful flags
● -ffast-math - if operating on floats in a reduction
computation (to allow the vectorizer to change the
order of the computation)
● -funsafe-loop-optimizations - if using "unsigned int"
loop counters (can be assumed not to overflow)
● -ftree-loop-if-convert-stores - more aggressive
if-conversion
● --param min-vect-loop-bound=[X] - if have loops with a
short trip-count
● -fno-vect-loop-version- if worried about code size
4
What's vectorizable
● Innermost loops
● countable
● no control flow
● independent data accesses
● continuous data accesses
for (k = 0; k < m; k ++)
for (j = 0; j < m; j ++)
for (i = 0; i < n; i ++)
a[k][j][i] = b[k][j][i] * c[k][j][i];
Example of not vectorizable loop:
while (a[i] != 8)
{
if (a[i] != 0)
a[i] = a[i-1];
b[i+stride] = 0;
}
uncountable
control flow
loop carried dependence
access with unknown stride
5
Special features
● vectorization of outer loops
● vectorization of straight-line code
● if-conversion
● multiple data-types and type conversions
● recognition of special idioms (e.g. dot-product,
widening operations)
● strided memory accesses
● cost model
● runtime aliasing and alignment tests
● auto-detection of vector size
Examples:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
6
GCC Versions
● Current Linaro GCC is based on FSF GCC 4.6
● Once FSF GCC 4.7 is released (in about six
months) Linaro GCC will switch to GCC 4.7
● Some of GCC 4.7 vectorizer related features:
● __builtin_assume_aligned – alignment hints
● vectorization of conditions with mixed types
● vectorization of bool
7
Special features
● vectorization of outer loops
● vectorization of straight-line code
● if-conversion
● multiple data-types and type conversions
● recognition of special idioms (e.g. dot-product,
widening operations)
● strided memory accesses
● cost model
● runtime aliasing and alignment tests
● auto-detection of vector size
Examples:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
8
Vectorizing for NEON
libswscale/rgb2rgb_template.c:
static inline void rgb24tobgr16_c(
const uint8_t *src, uint8_t *dst,
int src_size) {
const uint8_t *s = src;
const uint8_t *end;
uint16_t *d = (uint16_t *)dst;
end = s + src_size;
while (s < end) {
const int b = *s++;
const int g = *s++;
const int r = *s++;
*d++ = (b>>3) | ((g&0xFC)<<3)
| ((r&0xF8)<<8);
} }
.L12:
mov r1, r4
add r0, r0, #1
vld3.8 {d16, d18, d20}, [r1]!
cmp r0, r6
mov r2, ip
add r4, r4, #48
add ip, ip, #32
vld3.8 {d17, d19, d21}, [r1]
vand q12, q9, q14
vshr.u8 q11, q8, #3
vand q8, q10, q13
vshll.u8 q9, d25, #3
vmovl.u8 q10, d22
vshll.u8 q15, d24, #3
vmovl.u8 q11, d23
vorr q12, q15, q10
vorr q11, q9, q11
vshll.u8 q10, d16, #8
vshll.u8 q9, d17, #8
vorr q8, q12, q10
vorr q11, q11, q9
vst1.16 {q8}, [r2]!
vst1.16 {q11}, [r2]
bcc .L12
scalar: 75000 runs take 216.583ms
vector: 75000 runs take 48.8586ms
speedup: 4.433x
strided access
& and shift right
performed on u8
widening shift
no over-promotion to s32
9
Writing vectorizer-friendly code
● Avoid aliasing problems
– Use __restrict__ qualified pointers
void foo (int *__restrict__ pInput, int *__restrict__ pOutput)
● Don’t unroll loops
– Loop vectorization is more powerful than SLP
for (i=0; i<n; i+=4) {
sum += a[0]; for (i=0; i<n; i++)
sum += a[1]; sum += a[i];
sum += a[2];
sum += a[3];
a += 4;}
10
Writing vectorizer-friendly code (cont.)
● Use countable loops, with no side-effects
– No function-calls in the loop (distribute into a separate
loop)
for (i=0; i<n; i++)
for (i=0; i<n; i++) { if (a[i] == 0) foo();
if (a[i] == 0) foo(); for (i=0; i<n; i++)
b[i] = c[i]; } b[i] = c[i];
– No ‘break’/’continue’
for (i=0; i<n; i++)
for (i=0; i<n; i++) { if (a[i] == 8) {m = i; break;}
if (a[i] == 8) break; for (i=0; i<m; i++)
b[i] = c[i]; } b[i] = c[i];
11
Writing vectorizer-friendly code (cont.)
● Keep the memory access-pattern simple
– Don't use indirect accesses, e.g.:
for (i=0; i<n; i++)
a[b[i]] = x;
– Don't use unknown stride, e.g.:
for (i=0; i<n; i++)
a[i+stride] = x;
● Use "int" iterators rather than "unsigned int" iterators
– The C standard says that the former cannot overflow,
which helps the compiler to determine the trip count.
12
Some of our recent contributions
● Support of vldN/vstN
● NEON specific patterns: e.g. widening shift
● SLP (straight-line code vectorization)
improvements
● RTL improvements:
● reducing the number of moves and amount of
spilling (both for auto- and hand-vectorised code)
● improving modulo scheduling of NEON code
13
People
● Linaro Toolchain WG
● Ira Rosen (IRC: irar)
ira.rosen@linaro.org
– auto-vectorizer
● Richard Sandiford (IRC: rsandifo)
richard.sandiford@linaro.org
– NEON back-end/RTL optimizations
14
Helping us
Send us examples of code that are important to
you to vectorize.
15
Output Example
ex.c:
1 #define N 128
2 int a[N], b[N];
3 void foo (void)
4 {
5 int i;
6
7 for (i = 0; i < N; i++)
8 a[i] = i;
9
10 for (i = 0; i < N; i+=5)
11 b[i] = i;
12 }
● What's got vectorized:
gcc -c -O3 -ftree-vectorizer-verbose=1 ex.c
ex.c:7: note: LOOP VECTORIZED.
ex.c:3: note: vectorized 1 loops in function.
● What's got vectorized and what didn't:
gcc -c -O3 -ftree-vectorizer-verbose=2 ex.c
ex.c:10: note: not vectorized: complicated access
pattern.
ex.c:10: note: not vectorized: complicated access
pattern.
ex.c:7: note: LOOP VECTORIZED.
ex.c:3: note: vectorized 1 loops in function.
...
● All the details:
gcc -c -O3 -ftree-vectorizer-verbose=9 ex.c
or
gcc -c -O3 -fdump-tree-vect-details ex.c

Contenu connexe

Tendances

AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解MITSUNARI Shigeo
 
Bert(transformer,attention)
Bert(transformer,attention)Bert(transformer,attention)
Bert(transformer,attention)norimatsu5
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichDevOpsDays Tel Aviv
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingDVClub
 
Tensorflow lite for microcontroller
Tensorflow lite for microcontrollerTensorflow lite for microcontroller
Tensorflow lite for microcontrollerRouyun Pan
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールMITSUNARI Shigeo
 
Bringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big DataBringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big Databcantrill
 
USB3.0ドライバ開発の道
USB3.0ドライバ開発の道USB3.0ドライバ開発の道
USB3.0ドライバ開発の道uchan_nos
 
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
Gor Nishanov,  C++ Coroutines – a negative overhead abstractionGor Nishanov,  C++ Coroutines – a negative overhead abstraction
Gor Nishanov, C++ Coroutines – a negative overhead abstractionSergey Platonov
 
CXL_説明_公開用.pdf
CXL_説明_公開用.pdfCXL_説明_公開用.pdf
CXL_説明_公開用.pdfYasunori Goto
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_FinalGopi Krishnamurthy
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜京大 マイコンクラブ
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safeKumazaki Hiroki
 
組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門Norishige Fukushima
 
COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺hydai
 
CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説
CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説
CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説Takateru Yamagishi
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!Mr. Vengineer
 

Tendances (20)

Pcie drivers basics
Pcie drivers basicsPcie drivers basics
Pcie drivers basics
 
AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解
 
Bert(transformer,attention)
Bert(transformer,attention)Bert(transformer,attention)
Bert(transformer,attention)
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference Modeling
 
Tensorflow lite for microcontroller
Tensorflow lite for microcontrollerTensorflow lite for microcontroller
Tensorflow lite for microcontroller
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツール
 
Bringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big DataBringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big Data
 
USB3.0ドライバ開発の道
USB3.0ドライバ開発の道USB3.0ドライバ開発の道
USB3.0ドライバ開発の道
 
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
Gor Nishanov,  C++ Coroutines – a negative overhead abstractionGor Nishanov,  C++ Coroutines – a negative overhead abstraction
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
 
Linux : PSCI
Linux : PSCILinux : PSCI
Linux : PSCI
 
CXL_説明_公開用.pdf
CXL_説明_公開用.pdfCXL_説明_公開用.pdf
CXL_説明_公開用.pdf
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safe
 
組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門
 
COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺COSCUP2016 - LLVM框架、由淺入淺
COSCUP2016 - LLVM框架、由淺入淺
 
CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説
CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説
CUDAのアセンブリ言語基礎のまとめ PTXとSASSの概説
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!
 

En vedette

Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsLinaro
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64Yi-Hsiu Hsu
 
COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1NOWAY
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELinaro
 
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Leon Anavi
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELinaro
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEELinaro
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Yannick Gicquel
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64Yi-Hsiu Hsu
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEELinaro
 
LCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELinaro
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewLinaro
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3Linaro
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingMerck Hung
 
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelBUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelLinaro
 
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMXPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMThe Linux Foundation
 

En vedette (18)

Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON Intrinsics
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64
 
COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEE
 
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEE
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEE
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
 
LCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEE
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting Review
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefing
 
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelBUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
 
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMXPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
 

Similaire à Q4.11: Using GCC Auto-Vectorizer

深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法MITSUNARI Shigeo
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfYodalee
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerPlatonov Sergey
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone? DVClub
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systemsVsevolod Stakhov
 
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...corehard_by
 
Lab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsLab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsRup Chowdhury
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codePVS-Studio
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codeAndrey Karpov
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Windows Developer
 
리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3Sangho Park
 

Similaire à Q4.11: Using GCC Auto-Vectorizer (20)

深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdf
 
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
Fedor Polyakov - Optimizing computer vision problems on mobile platforms Fedor Polyakov - Optimizing computer vision problems on mobile platforms
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone?
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
Meltdown & spectre
Meltdown & spectreMeltdown & spectre
Meltdown & spectre
 
Meltdown & Spectre
Meltdown & Spectre Meltdown & Spectre
Meltdown & Spectre
 
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
 
Lab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsLab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer Graphics
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
 
리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 

Plus de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Plus de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Dernier

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Q4.11: Using GCC Auto-Vectorizer

  • 1. 1 Using GCC Auto-Vectorizer Ira Rosen <ira.rosen@linaro.org> Michael Hope <michael.hope@linaro.org> r1 bzr branch lp:~michaelh1/+junk/using-the-vectorizer
  • 2. 2 Using GCC Vectorizer ● Vectorization is enabled by the flag -ftree-vectorize and by default at -O3: ● gcc –O2 –ftree-vectorize myloop.c ● or gcc –O3 myloop.c ● To enable NEON: -mfpu=neon -mfloat-abi=softfp or -mfloat-abi=hard ● Information on which loops got vectorized, and which didn’t and why: ● -fdump-tree-vect(-details) – dumps information into myloop.c.##t.vect ● -ftree-vectorizer-verbose=[X] – dumps to stderr ● More information: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
  • 3. 3 Other useful flags ● -ffast-math - if operating on floats in a reduction computation (to allow the vectorizer to change the order of the computation) ● -funsafe-loop-optimizations - if using "unsigned int" loop counters (can be assumed not to overflow) ● -ftree-loop-if-convert-stores - more aggressive if-conversion ● --param min-vect-loop-bound=[X] - if have loops with a short trip-count ● -fno-vect-loop-version- if worried about code size
  • 4. 4 What's vectorizable ● Innermost loops ● countable ● no control flow ● independent data accesses ● continuous data accesses for (k = 0; k < m; k ++) for (j = 0; j < m; j ++) for (i = 0; i < n; i ++) a[k][j][i] = b[k][j][i] * c[k][j][i]; Example of not vectorizable loop: while (a[i] != 8) { if (a[i] != 0) a[i] = a[i-1]; b[i+stride] = 0; } uncountable control flow loop carried dependence access with unknown stride
  • 5. 5 Special features ● vectorization of outer loops ● vectorization of straight-line code ● if-conversion ● multiple data-types and type conversions ● recognition of special idioms (e.g. dot-product, widening operations) ● strided memory accesses ● cost model ● runtime aliasing and alignment tests ● auto-detection of vector size Examples: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
  • 6. 6 GCC Versions ● Current Linaro GCC is based on FSF GCC 4.6 ● Once FSF GCC 4.7 is released (in about six months) Linaro GCC will switch to GCC 4.7 ● Some of GCC 4.7 vectorizer related features: ● __builtin_assume_aligned – alignment hints ● vectorization of conditions with mixed types ● vectorization of bool
  • 7. 7 Special features ● vectorization of outer loops ● vectorization of straight-line code ● if-conversion ● multiple data-types and type conversions ● recognition of special idioms (e.g. dot-product, widening operations) ● strided memory accesses ● cost model ● runtime aliasing and alignment tests ● auto-detection of vector size Examples: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
  • 8. 8 Vectorizing for NEON libswscale/rgb2rgb_template.c: static inline void rgb24tobgr16_c( const uint8_t *src, uint8_t *dst, int src_size) { const uint8_t *s = src; const uint8_t *end; uint16_t *d = (uint16_t *)dst; end = s + src_size; while (s < end) { const int b = *s++; const int g = *s++; const int r = *s++; *d++ = (b>>3) | ((g&0xFC)<<3) | ((r&0xF8)<<8); } } .L12: mov r1, r4 add r0, r0, #1 vld3.8 {d16, d18, d20}, [r1]! cmp r0, r6 mov r2, ip add r4, r4, #48 add ip, ip, #32 vld3.8 {d17, d19, d21}, [r1] vand q12, q9, q14 vshr.u8 q11, q8, #3 vand q8, q10, q13 vshll.u8 q9, d25, #3 vmovl.u8 q10, d22 vshll.u8 q15, d24, #3 vmovl.u8 q11, d23 vorr q12, q15, q10 vorr q11, q9, q11 vshll.u8 q10, d16, #8 vshll.u8 q9, d17, #8 vorr q8, q12, q10 vorr q11, q11, q9 vst1.16 {q8}, [r2]! vst1.16 {q11}, [r2] bcc .L12 scalar: 75000 runs take 216.583ms vector: 75000 runs take 48.8586ms speedup: 4.433x strided access & and shift right performed on u8 widening shift no over-promotion to s32
  • 9. 9 Writing vectorizer-friendly code ● Avoid aliasing problems – Use __restrict__ qualified pointers void foo (int *__restrict__ pInput, int *__restrict__ pOutput) ● Don’t unroll loops – Loop vectorization is more powerful than SLP for (i=0; i<n; i+=4) { sum += a[0]; for (i=0; i<n; i++) sum += a[1]; sum += a[i]; sum += a[2]; sum += a[3]; a += 4;}
  • 10. 10 Writing vectorizer-friendly code (cont.) ● Use countable loops, with no side-effects – No function-calls in the loop (distribute into a separate loop) for (i=0; i<n; i++) for (i=0; i<n; i++) { if (a[i] == 0) foo(); if (a[i] == 0) foo(); for (i=0; i<n; i++) b[i] = c[i]; } b[i] = c[i]; – No ‘break’/’continue’ for (i=0; i<n; i++) for (i=0; i<n; i++) { if (a[i] == 8) {m = i; break;} if (a[i] == 8) break; for (i=0; i<m; i++) b[i] = c[i]; } b[i] = c[i];
  • 11. 11 Writing vectorizer-friendly code (cont.) ● Keep the memory access-pattern simple – Don't use indirect accesses, e.g.: for (i=0; i<n; i++) a[b[i]] = x; – Don't use unknown stride, e.g.: for (i=0; i<n; i++) a[i+stride] = x; ● Use "int" iterators rather than "unsigned int" iterators – The C standard says that the former cannot overflow, which helps the compiler to determine the trip count.
  • 12. 12 Some of our recent contributions ● Support of vldN/vstN ● NEON specific patterns: e.g. widening shift ● SLP (straight-line code vectorization) improvements ● RTL improvements: ● reducing the number of moves and amount of spilling (both for auto- and hand-vectorised code) ● improving modulo scheduling of NEON code
  • 13. 13 People ● Linaro Toolchain WG ● Ira Rosen (IRC: irar) ira.rosen@linaro.org – auto-vectorizer ● Richard Sandiford (IRC: rsandifo) richard.sandiford@linaro.org – NEON back-end/RTL optimizations
  • 14. 14 Helping us Send us examples of code that are important to you to vectorize.
  • 15. 15 Output Example ex.c: 1 #define N 128 2 int a[N], b[N]; 3 void foo (void) 4 { 5 int i; 6 7 for (i = 0; i < N; i++) 8 a[i] = i; 9 10 for (i = 0; i < N; i+=5) 11 b[i] = i; 12 } ● What's got vectorized: gcc -c -O3 -ftree-vectorizer-verbose=1 ex.c ex.c:7: note: LOOP VECTORIZED. ex.c:3: note: vectorized 1 loops in function. ● What's got vectorized and what didn't: gcc -c -O3 -ftree-vectorizer-verbose=2 ex.c ex.c:10: note: not vectorized: complicated access pattern. ex.c:10: note: not vectorized: complicated access pattern. ex.c:7: note: LOOP VECTORIZED. ex.c:3: note: vectorized 1 loops in function. ... ● All the details: gcc -c -O3 -ftree-vectorizer-verbose=9 ex.c or gcc -c -O3 -fdump-tree-vect-details ex.c