SlideShare une entreprise Scribd logo
1  sur  15
PLDI 2017 Tutorial Session
Vectorization with LMS:
SIMD Intrinsics
Alen StojanovDepartment of Computer Science,
ETH Zurich, Switzerland
2
SISD
SIMD
1
3
2
4
1
3
2
4
1
3
2
4
1
3
2
4
1
3
2
4
1
3
2
4
What is SIMD?
Single Instruction
Multiple Data
3
SISD
SIMD
1
3
2
4
1
3
2
4
1
3
2
4
1
3
2
4
1
3
2
4
1
3
2
4
AVX x4
#define T double
void add(T* x, T* y, T* z, int N) {
for(int i = 0; i < N; ++i) {
T x1, y1, z1;
x1 = x[i];
y1 = y[i];
z1 = x1 + y1;
z[i] = z1;
}
}
Scalar
#define T double
void add(T* x, T* y, T* z, int N) {
for(int i = 0; i < N; i += 4) {
__m256d x1, y1, z1;
x1 = _mm256_loadu_pd(x + i);
y1 = _mm256_loadu_pd(y + i);
z1 = _mm256_add_pd(x1, y1);
_mm256_storeu_pd(z + i, z1);
}
}
4
SISD
SIMDAVX x4
#define T double
void add(T* x, T* y, T* z, int N) {
for(int i = 0; i < N; ++i) {
T x1, y1, z1;
x1 = x[i];
y1 = y[i];
z1 = x1 + y1;
z[i] = z1;
}
}
Scalar
#define T double
void add(T* x, T* y, T* z, int N) {
for(int i = 0; i < N; i += 4) {
__m256d x1, y1, z1;
x1 = _mm256_loadu_pd(x + i);
y1 = _mm256_loadu_pd(y + i);
z1 = _mm256_add_pd(x1, y1);
_mm256_storeu_pd(z + i, z1);
}
}
LBB0_3:
movsd (%rdi,%rax,8), %xmm0
addsd (%rsi,%rax,8), %xmm0
movsd %xmm0, (%rdx,%rax,8)
incq %rax
cmpl %eax, %r9d
jne LBB0_3
LBB0_3:
vmovupd (%rdi,%r10,8), %ymm0
vaddpd (%rsi,%r10,8), %ymm0, %ymm0
vmovupd %ymm0, (%rax)
addq $4, %r10
addq $32, %rax
addq $1, %rcx
jne LBB0_3
• MMX
• SSE / SSE2 / SSE3 / SSSE3 / SSE4.1 / SSE4.2
• AVX / AVX2 / AVX-512
• FMA / KNC / SVML
8x float
4x double
32x 8-bits
16x 16-bits
8x 32-bits
4x 64-bits
256-bit
AVX
4x floats
2x doubles
16x 8-bits
8x 16-bits
4x 32-bits
2x 64-bits
SSE
operands
for each
6
That’s not all
Shuffles:
• _mm256_permutevar_pd
• _mm256_shufflehi_epi16
• …
Strings:
• _mm_cmpestrm
• _mm_cmpistrm
• ..
Bitwise operators:
• _mm256_bslli_epi128
• _mm512_rol_epi32
• …
Statistics:
• _mm_avg_epu8
• _mm256_cdfnorm_pd
• …
Logical:
• _mm256_or_pd
• _mm256_andnot_pd
• …
Crypto:
• _mm_aesdec_si128
• _mm_sha1msg1_epu32
• …
Loads:
• _mm_i32gather_epi32
• _mm256_broadcast_ps
• …
Stores:
• _mm512_storenrngo_pd
• _mm_store_pd1.
• …
Casts:
• _mm256_castps_pd
• _mm256_cvtps_epi32
• …
7
There are a lot of SIMD instructions
AVX-512 has 3519 intrinsics
How do you port all intrinsics into LMS?
Ivaylo Toskov
ETH Zurich
Idea #2: Generate them automatically
Idea #1: Get a Master student to do it
9
data-3.3.16.xml
Challenge #1
Scala chokes on big classes ~ 64kB
limit for a method
• Split the implementation
into multiple classes
• Make one trait inherit all
split classes
Challenge #2
LMS has read / write effects
• Produce the effects
automatically using the
category data in the Intel
Intrinsics Guide
<intrinsic tech='AVX' rettype='__m256d' name='_mm256_loadu_pd'>
<type>Floating Point</type>
<CPUID>AVX</CPUID>
<category>Load</category>
<parameter varname='mem_addr' type='double const *’ />
<description>
Load 256-bits (composed of 4 packed
double-precision (64-bit) floating-point elements)
from memory into "dst". "mem_addr" does not need
to be aligned on any particular boundary.
</description>
<operation>
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
</operation>
<instruction name='vmovupd' form='ymm, m256’ />
<header>immintrin.h</header>
</intrinsic>
Challenge #3
Type Mappings – unsigned?
• Use Scala Unsigned for
unsigned operations.
Challenge #4
Pointers?
• Disallow and use memory
offsets instead
Challenge #5
Implement Arrays only?
• Abstract containers for the
need of the DSL
Challenge #6, #7, ...
Try to think of everything?
• Checked.
13
https://github.com/ivtoskov/lms-intrinsics
How do we make use of
the intrinsics ?
15
https://github.com/astojanov/lms-tutorial-pldi

Contenu connexe

Tendances

The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...Positive Hack Days
 
Computer graphics lab manual
Computer graphics lab manualComputer graphics lab manual
Computer graphics lab manualUma mohan
 
Digital Logic Circuits
Digital Logic CircuitsDigital Logic Circuits
Digital Logic Circuitssathish sak
 
Bitwise Operations in Programming
Bitwise Operations in ProgrammingBitwise Operations in Programming
Bitwise Operations in ProgrammingSvetlin Nakov
 
Computer Graphics Lab File C Programs
Computer Graphics Lab File C ProgramsComputer Graphics Lab File C Programs
Computer Graphics Lab File C ProgramsKandarp Tiwari
 
Computer graphics lab report with code in cpp
Computer graphics lab report with code in cppComputer graphics lab report with code in cpp
Computer graphics lab report with code in cppAlamgir Hossain
 
Digital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationDigital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationfoyez ahammad
 
Decoder for digital electronics
Decoder for digital electronicsDecoder for digital electronics
Decoder for digital electronicsIIT, KANPUR INDIA
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programsAmit Kapoor
 
PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)Fizaril Amzari Omar
 
Computer graphics programs in c++
Computer graphics programs in c++Computer graphics programs in c++
Computer graphics programs in c++Ankit Kumar
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Hsien-Hsin Sean Lee, Ph.D.
 
PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018Fizaril Amzari Omar
 
Name dld preparation
Name dld preparationName dld preparation
Name dld preparationPadam Rai
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Hsien-Hsin Sean Lee, Ph.D.
 
Defense Senior College on Error Coding presentation 4/22/2010
Defense Senior College on Error Coding presentation 4/22/2010Defense Senior College on Error Coding presentation 4/22/2010
Defense Senior College on Error Coding presentation 4/22/2010Felicia Fort, MBA
 

Tendances (20)

The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 
Computer graphics lab manual
Computer graphics lab manualComputer graphics lab manual
Computer graphics lab manual
 
Digital Logic Circuits
Digital Logic CircuitsDigital Logic Circuits
Digital Logic Circuits
 
Dpsd lecture-notes
Dpsd lecture-notesDpsd lecture-notes
Dpsd lecture-notes
 
Bitwise Operations in Programming
Bitwise Operations in ProgrammingBitwise Operations in Programming
Bitwise Operations in Programming
 
Computer Graphics Lab File C Programs
Computer Graphics Lab File C ProgramsComputer Graphics Lab File C Programs
Computer Graphics Lab File C Programs
 
Decoder
DecoderDecoder
Decoder
 
Computer graphics lab report with code in cpp
Computer graphics lab report with code in cppComputer graphics lab report with code in cpp
Computer graphics lab report with code in cpp
 
Digital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationDigital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentation
 
Unit 4 dica
Unit 4 dicaUnit 4 dica
Unit 4 dica
 
Decoder for digital electronics
Decoder for digital electronicsDecoder for digital electronics
Decoder for digital electronics
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programs
 
PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)
 
Computer graphics programs in c++
Computer graphics programs in c++Computer graphics programs in c++
Computer graphics programs in c++
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
 
PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018
 
Name dld preparation
Name dld preparationName dld preparation
Name dld preparation
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
 
Defense Senior College on Error Coding presentation 4/22/2010
Defense Senior College on Error Coding presentation 4/22/2010Defense Senior College on Error Coding presentation 4/22/2010
Defense Senior College on Error Coding presentation 4/22/2010
 
Computer graphics
Computer graphics   Computer graphics
Computer graphics
 

Similaire à PLDI 2017 Tutorial Session on Vectorization with LMS SIMD Intrinsics

Designing C++ portable SIMD support
Designing C++ portable SIMD supportDesigning C++ portable SIMD support
Designing C++ portable SIMD supportJoel Falcou
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLJonas Traub
 
SIMD.pptx
SIMD.pptxSIMD.pptx
SIMD.pptxdk03006
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Jonathan Salwan
 
Georgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software securityGeorgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software securityDefconRussia
 
Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыDevGAMM Conference
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systemsVsevolod Stakhov
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to knowRoberto Agostino Vitillo
 
Nsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crashNsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crashFabio Pignatti
 
Two fish & Rijndael (AES) Encryption Algorithm
Two fish & Rijndael (AES) Encryption AlgorithmTwo fish & Rijndael (AES) Encryption Algorithm
Two fish & Rijndael (AES) Encryption AlgorithmRifat Tasnim
 
5 - Advanced SVE.pdf
5 - Advanced SVE.pdf5 - Advanced SVE.pdf
5 - Advanced SVE.pdfJunZhao68
 
Tech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowTech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowRamdhan Rizki
 
Overview on Cryptography and Network Security
Overview on Cryptography and Network SecurityOverview on Cryptography and Network Security
Overview on Cryptography and Network SecurityDr. Rupa Ch
 
SE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod SchultzSE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod SchultzAMD Developer Central
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1asslang
 

Similaire à PLDI 2017 Tutorial Session on Vectorization with LMS SIMD Intrinsics (20)

Designing C++ portable SIMD support
Designing C++ portable SIMD supportDesigning C++ portable SIMD support
Designing C++ portable SIMD support
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
SIMD.pptx
SIMD.pptxSIMD.pptx
SIMD.pptx
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach
 
4-DES.pdf
4-DES.pdf4-DES.pdf
4-DES.pdf
 
Georgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software securityGeorgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software security
 
Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформы
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to know
 
Nsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crashNsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crash
 
Cryptography
CryptographyCryptography
Cryptography
 
Two fish & Rijndael (AES) Encryption Algorithm
Two fish & Rijndael (AES) Encryption AlgorithmTwo fish & Rijndael (AES) Encryption Algorithm
Two fish & Rijndael (AES) Encryption Algorithm
 
5 - Advanced SVE.pdf
5 - Advanced SVE.pdf5 - Advanced SVE.pdf
5 - Advanced SVE.pdf
 
Tech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowTech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflow
 
Overview on Cryptography and Network Security
Overview on Cryptography and Network SecurityOverview on Cryptography and Network Security
Overview on Cryptography and Network Security
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
 
SE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod SchultzSE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1
 
Cryptography 202
Cryptography 202Cryptography 202
Cryptography 202
 

Dernier

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

PLDI 2017 Tutorial Session on Vectorization with LMS SIMD Intrinsics

  • 1. PLDI 2017 Tutorial Session Vectorization with LMS: SIMD Intrinsics Alen StojanovDepartment of Computer Science, ETH Zurich, Switzerland
  • 3. 3 SISD SIMD 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 AVX x4 #define T double void add(T* x, T* y, T* z, int N) { for(int i = 0; i < N; ++i) { T x1, y1, z1; x1 = x[i]; y1 = y[i]; z1 = x1 + y1; z[i] = z1; } } Scalar #define T double void add(T* x, T* y, T* z, int N) { for(int i = 0; i < N; i += 4) { __m256d x1, y1, z1; x1 = _mm256_loadu_pd(x + i); y1 = _mm256_loadu_pd(y + i); z1 = _mm256_add_pd(x1, y1); _mm256_storeu_pd(z + i, z1); } }
  • 4. 4 SISD SIMDAVX x4 #define T double void add(T* x, T* y, T* z, int N) { for(int i = 0; i < N; ++i) { T x1, y1, z1; x1 = x[i]; y1 = y[i]; z1 = x1 + y1; z[i] = z1; } } Scalar #define T double void add(T* x, T* y, T* z, int N) { for(int i = 0; i < N; i += 4) { __m256d x1, y1, z1; x1 = _mm256_loadu_pd(x + i); y1 = _mm256_loadu_pd(y + i); z1 = _mm256_add_pd(x1, y1); _mm256_storeu_pd(z + i, z1); } } LBB0_3: movsd (%rdi,%rax,8), %xmm0 addsd (%rsi,%rax,8), %xmm0 movsd %xmm0, (%rdx,%rax,8) incq %rax cmpl %eax, %r9d jne LBB0_3 LBB0_3: vmovupd (%rdi,%r10,8), %ymm0 vaddpd (%rsi,%r10,8), %ymm0, %ymm0 vmovupd %ymm0, (%rax) addq $4, %r10 addq $32, %rax addq $1, %rcx jne LBB0_3
  • 5. • MMX • SSE / SSE2 / SSE3 / SSSE3 / SSE4.1 / SSE4.2 • AVX / AVX2 / AVX-512 • FMA / KNC / SVML 8x float 4x double 32x 8-bits 16x 16-bits 8x 32-bits 4x 64-bits 256-bit AVX 4x floats 2x doubles 16x 8-bits 8x 16-bits 4x 32-bits 2x 64-bits SSE operands for each
  • 6. 6 That’s not all Shuffles: • _mm256_permutevar_pd • _mm256_shufflehi_epi16 • … Strings: • _mm_cmpestrm • _mm_cmpistrm • .. Bitwise operators: • _mm256_bslli_epi128 • _mm512_rol_epi32 • … Statistics: • _mm_avg_epu8 • _mm256_cdfnorm_pd • … Logical: • _mm256_or_pd • _mm256_andnot_pd • … Crypto: • _mm_aesdec_si128 • _mm_sha1msg1_epu32 • … Loads: • _mm_i32gather_epi32 • _mm256_broadcast_ps • … Stores: • _mm512_storenrngo_pd • _mm_store_pd1. • … Casts: • _mm256_castps_pd • _mm256_cvtps_epi32 • …
  • 7. 7 There are a lot of SIMD instructions AVX-512 has 3519 intrinsics
  • 8. How do you port all intrinsics into LMS? Ivaylo Toskov ETH Zurich Idea #2: Generate them automatically Idea #1: Get a Master student to do it
  • 10. Challenge #1 Scala chokes on big classes ~ 64kB limit for a method • Split the implementation into multiple classes • Make one trait inherit all split classes
  • 11. Challenge #2 LMS has read / write effects • Produce the effects automatically using the category data in the Intel Intrinsics Guide <intrinsic tech='AVX' rettype='__m256d' name='_mm256_loadu_pd'> <type>Floating Point</type> <CPUID>AVX</CPUID> <category>Load</category> <parameter varname='mem_addr' type='double const *’ /> <description> Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. </description> <operation> dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0 </operation> <instruction name='vmovupd' form='ymm, m256’ /> <header>immintrin.h</header> </intrinsic>
  • 12. Challenge #3 Type Mappings – unsigned? • Use Scala Unsigned for unsigned operations. Challenge #4 Pointers? • Disallow and use memory offsets instead Challenge #5 Implement Arrays only? • Abstract containers for the need of the DSL Challenge #6, #7, ... Try to think of everything? • Checked.
  • 14. How do we make use of the intrinsics ?