Pythran: Static Compilation of
Parallel Scientific Kernels
a.k.a. Python/Numpy compiler for the
mass
Proudly made in Namek...
/us
Serge « sans paille » Guelton
$ w h o a m i
s g u e l t o n
R&D engineer at on compilation for security
Associate rese...
Pythran in a snake shell
A Numpy-centric Python-to-C++ translator
A Python code optimizer
A Pythonic C++ library
Core concepts
Focus on high-level constructs
Generate clean high level code
Optimize Python code before generated code
Vec...
Ask StackOverflow
when you're looking for test cases
http://stackoverflow.com/[...]numba-or-cython-acceleration-in-
reacti...
Thread Summary
OP
My code is slow with Cython and Numba
Best Answer
You need to make all loops explicit
Cython Version
c i m p o r t c y t h o n
i m p o r t n u m p y a s n p
c i m p o r t n u m p y a s n p
c p d e f c y t h o...
Pythran version
Add this line to the original kernel:
# p y t h r a n e x p o r t G r a y S c o t t ( i n t , f l o a t , ...
Lessons learnt
Explicit is not always better than implicit
Many ``optimization hints'' can be deduced by the compiler
High...
Compilation Challenges
u = U [ 1 : - 1 , 1 : - 1 ]
U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0
u + = 0 ....
Optimization Opportunities
L u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ]
- 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1...
Pythran Usage
$ p y t h r a n - - h e l p
u s a g e : p y t h r a n [ - h ] [ - o O U T P U T _ F I L E ] [ - E ] [ - e ] ...
Sample Usage
$ p y t h r a n i n p u t . p y # g e n e r a t e s i n p u t . s o
$ p y t h r a n i n p u t . p y - E # g e...
Type Annotations
Only for exported functions
# p y t h r a n e x p o r t f o o 0 ( )
# p y t h r a n e x p o r t f o o 1 (...
Pythran Compilation Flow
input.py
AST
IR
outout.py input.cpp
input.so
Optimization loop
import ast
gcc [...]
Front End
100% based on the astmodule
Supports
Several standard module (incl. partial Numpy)
Polymorphic functions
ndarray...
Middle End
Iteratively applies high level, Python-aware optimizations:
Interprocedural Constant Folding
For-Based-Loop Unr...
Back Ends
Python Back End
Useful for debugging!
C++11 Back End
C++11 implementation of __builtin__ numpy
itertools…
Lazy e...
C++11: Typing
the W.T.F. slide
Pythran translates Python implicitly statically typed polymorphic
code into C++ meta-progra...
C++11: Parallelism
Implicit
Array operations and several numpy functions are written using
OpenMP and Boost.simd
Explicit
...
Benchmarks
A collection of high-level benchmarks
https://github.com/serge-sans-paille/numpy-benchmarks
Code gathered from ...
Benchmarks
no parallelism, no vectorisation (, no fat)
(Num)Focus: growcut
From the Numba codebase!
# p y t h r a n e x p o r t g r o w c u t ( f l o a t [ ] [ ] [ ] , f l o a t...
(Num)Focus: growcut
Academic Results
Pythran: Enabling Static Optimization of Scientific Python
Programs, S. Guelton, P. Brunet et al. in CSD,...
Powered by Strong Engineering
Preprequisite for reproductible science
2773 test cases, incl. unit testing, doctest, Contin...
We need more peons
Pythonic needs a serious cleanup
Typing module needs better error reporting
OSX support is partial and ...
THE END
AUTHORS
Serge Guelton, Pierrick Brunet, Mehdi Amini, Adrien
Merlini, Alan Raynaud, Eliott Coyac…
INDUSTRIAL SUPPOR...
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
Prochain SlideShare
Chargement dans…5
×

PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

1 050 vues

Publié le

Serge Guelton and Pierrick Brunet (Namek): “Pythran: Static Compilation of Parallel Scientific Kernels”

Abstract: As the use of Python coupled to Numpy/SciPy for numerical computation increases, many tools to optimize performance have emerged. Indeed, this duo has relatively poor performance when compared to scientific codes written in legacy languages like C or Fortran. Cython, Numba, numexpr and parakeet belongs to this new compiler ecosystem. And so does Pythran, a Python to C++11 translator for scientific Python.

Pythran uses a static compilation approach a la Cython, but with full backward compatibility with Python. It does not only turns Python code into C++ code, it also performs Python/Numpy specific optimizations, generates calls to a parallel, vectorized runtime and makes it possible to write OpenMP annotation in the original Python code. It supports a large range of Numpy functions and can combine them in efficient ways: it can optimize high­level modern Python/Numpy codes and not only Fortran­ with­ a­ Python­ flavor ones.

This talk presents the existing compilation approach and optimization opportunities for numerical Python, their strengths and weaknesses, then focus on the specificities of the Pythran compiler.

Publié dans : Données & analyses
0 commentaire
1 j’aime
Statistiques
Remarques
  • Soyez le premier à commenter

Aucun téléchargement
Vues
Nombre de vues
1 050
Sur SlideShare
0
Issues des intégrations
0
Intégrations
3
Actions
Partages
0
Téléchargements
5
Commentaires
0
J’aime
1
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

  1. 1. Pythran: Static Compilation of Parallel Scientific Kernels a.k.a. Python/Numpy compiler for the mass Proudly made in Namek by &serge-sans-paille pbrunet
  2. 2. /us Serge « sans paille » Guelton $ w h o a m i s g u e l t o n R&D engineer at on compilation for security Associate researcher at Télécom Bretagne QuarksLab Pierrick Brunet $ w h o a m i p b r u n e t R&D engineer at on parallelismINRIAlpes/MOAIS
  3. 3. Pythran in a snake shell A Numpy-centric Python-to-C++ translator A Python code optimizer A Pythonic C++ library
  4. 4. Core concepts Focus on high-level constructs Generate clean high level code Optimize Python code before generated code Vectorization and Paralllelism Test, test, test Bench, bench, bench
  5. 5. Ask StackOverflow when you're looking for test cases http://stackoverflow.com/[...]numba-or-cython-acceleration-in- reaction-diffusion-algorithm i m p o r t n u m p y a s n p d e f G r a y S c o t t ( c o u n t s , D u , D v , F , k ) : n = 3 0 0 U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) u , v = U [ 1 : - 1 , 1 : - 1 ] , V [ 1 : - 1 , 1 : - 1 ] r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) f o r i i n r a n g e ( c o u n t s ) : L u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] ) L v = ( V [ 0 : - 2 , 1 : - 1 ] +
  6. 6. Thread Summary OP My code is slow with Cython and Numba Best Answer You need to make all loops explicit
  7. 7. Cython Version c i m p o r t c y t h o n i m p o r t n u m p y a s n p c i m p o r t n u m p y a s n p c p d e f c y t h o n G r a y S c o t t ( i n t c o u n t s , d o u b l e D u , d o u b l e D v , d o u b l e F , d o u b l e k ) : c d e f i n t n = 3 0 0 c d e f n p . n d a r r a y U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y u = U [ 1 : - 1 , 1 : - 1 ] c d e f n p . n d a r r a y v = V [ 1 : - 1 , 1 : - 1 ] c d e f i n t r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) c d e f n p . n d a r r a y L u = n p . z e r o s _ l i k e ( u ) c d e f n p . n d a r r a y L v = n p . z e r o s _ l i k e ( v ) c d e f i n t i , c , r 1 , c 1 , r 2 , c 2 c d e f d o u b l e u v v
  8. 8. Pythran version Add this line to the original kernel: # p y t h r a n e x p o r t G r a y S c o t t ( i n t , f l o a t , f l o a t , f l o a t , f l o a t ) Timings $ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 1 0 l o o p s , b e s t o f 3 : 5 2 . 9 m s e c p e r l o o p $ c y t h o n g r a y s c o t t . p y x $ g c c g r a y s c o t t . c ` ` - s h a r e d - f P I C - o g r a y s c o t t . s o $ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 1 0 l o o p s , b e s t o f 3 : 3 6 . 4 m s e c p e r l o o p $ p y t h r a n g r a y s c o t t . p y - O 3 - m a r c h = n a t i v e $ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 1 0 l o o p s , b e s t o f 3 : 2 0 . 3 m s e c p e r l o o p p y t h o n - c o n f i g - - c f l a g s - - l i b s
  9. 9. Lessons learnt Explicit is not always better than implicit Many ``optimization hints'' can be deduced by the compiler High level constructs carry valuable informations I am not saying Cython is bad. Cython does a great job. It is just pragmatic where Pythran is idealist
  10. 10. Compilation Challenges u = U [ 1 : - 1 , 1 : - 1 ] U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) Array views Value broadcasting Temporary arrays creation Extended slices composition Numpy API calls
  11. 11. Optimization Opportunities L u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] ) Many useless temporaries Lucould be forward-substituted SIMD instruction generation opportunities Parallel loop opportunities
  12. 12. Pythran Usage $ p y t h r a n - - h e l p u s a g e : p y t h r a n [ - h ] [ - o O U T P U T _ F I L E ] [ - E ] [ - e ] [ - f f l a g ] [ - v ] [ - p p a s s ] [ - m m a c h i n e ] [ - I i n c l u d e _ d i r ] [ - L l d f l a g s ] [ - D m a c r o _ d e f i n i t i o n ] [ - O l e v e l ] [ - g ] i n p u t _ f i l e p y t h r a n : a p y t h o n t o C + + c o m p i l e r p o s i t i o n a l a r g u m e n t s : i n p u t _ f i l e t h e p y t h r a n m o d u l e t o c o m p i l e , e i t h e r a . p y o r a . c p p f i l e o p t i o n a l a r g u m e n t s : - h , - - h e l p s h o w t h i s h e l p m e s s a g e a n d e x i t - o O U T P U T _ F I L E p a t h t o g e n e r a t e d f i l e - E o n l y r u n t h e t r a n s l a t o r , d o n o t c o m p i l e - e s i m i l a r t o - E , b u t d o e s n o t g e n e r a t e p y t h o n g l u e - f f l a g a n y c o m p i l e r s w i t c h r e l e v a n t t o t h e u n d e r l y i n g C + + c o m p i l e r - v b e v e r b o s e - p p a s s a n y p y t h r a n o p t i m i z a t i o n t o a p p l y b e f o r e c o d e g e n e r a t i o n - m m a c h i n e a n y m a c h i n e f l a g r e l e v a n t t o t h e u n d e r l y i n g C + +
  13. 13. Sample Usage $ p y t h r a n i n p u t . p y # g e n e r a t e s i n p u t . s o $ p y t h r a n i n p u t . p y - E # g e n e r a t e s i n p u t . c p p $ p y t h r a n i n p u t . p y - O 3 - f o p e n m p # p a r a l l e l ! $ p y t h r a n i n p u t . p y - m a r c h = n a t i v e - O f a s t # E s o d M u m i x a m !
  14. 14. Type Annotations Only for exported functions # p y t h r a n e x p o r t f o o 0 ( ) # p y t h r a n e x p o r t f o o 1 ( i n t ) # p y t h r a n e x p o r t f o o 2 ( f l o a t 3 2 [ ] [ ] ) # p y t h r a n e x p o r t f o o 2 ( f l o a t 6 4 [ ] [ ] ) # p y t h r a n e x p o r t f o o 2 ( i n t 8 [ ] [ ] [ ] ) # p y t h r a n e x p o r t f o o 3 ( ( i n t , f l o a t ) , i n t l i s t , s t r : s t r d i c t )
  15. 15. Pythran Compilation Flow input.py AST IR outout.py input.cpp input.so Optimization loop import ast gcc [...]
  16. 16. Front End 100% based on the astmodule Supports Several standard module (incl. partial Numpy) Polymorphic functions ndarray, list, tuple, dict, str, int, long, float Named parameters, default arguments Generators… Does not Support Non-implicitely typed code Global variable Most Python modules (no CPython mode!) User-defined classes…
  17. 17. Middle End Iteratively applies high level, Python-aware optimizations: Interprocedural Constant Folding For-Based-Loop Unrolling Forward Substitution Instruction Selection Deforestation Scalar Renaming dead Code Elimination Fun Facts: can evaluate pure functions at compile time ☺
  18. 18. Back Ends Python Back End Useful for debugging! C++11 Back End C++11 implementation of __builtin__ numpy itertools… Lazy evaluation through Expression Templates Relies on OpenMP, nt2and boost::simdfor the parallelization / vectorization
  19. 19. C++11: Typing the W.T.F. slide Pythran translates Python implicitly statically typed polymorphic code into C++ meta-programs that are instanciated for the user- given types, and specialize them for the target architecture
  20. 20. C++11: Parallelism Implicit Array operations and several numpy functions are written using OpenMP and Boost.simd Explicit OpenMP 3 support, ported to Python # o m p p a r a l l e l f o r r e d u c t i o n ( + : r ) f o r i , v i n e n u m e r a t e ( l ) : r + = i * v
  21. 21. Benchmarks A collection of high-level benchmarks https://github.com/serge-sans-paille/numpy-benchmarks Code gathered from StackOverflow + other compiler code base Mostly high-level code Generate results for CPython, PyPy, Numba, Parakeet, Hope and Pythran Most kernels are too high level for Numba and Hope…
  22. 22. Benchmarks no parallelism, no vectorisation (, no fat)
  23. 23. (Num)Focus: growcut From the Numba codebase! # p y t h r a n e x p o r t g r o w c u t ( f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , i n t ) i m p o r t m a t h i m p o r t n u m p y a s n p d e f w i n d o w _ f l o o r ( i d x , r a d i u s ) : i f r a d i u s > i d x : r e t u r n 0 e l s e : r e t u r n i d x - r a d i u s d e f w i n d o w _ c e i l ( i d x , c e i l , r a d i u s ) : i f i d x + r a d i u s > c e i l : r e t u r n c e i l e l s e : r e t u r n i d x + r a d i u s d e f g r o w c u t ( i m a g e , s t a t e , s t a t e _ n e x t , w i n d o w _ r a d i u s ) : c h a n g e s = 0 s q r t _ 3 = m a t h . s q r t ( 3 . 0 ) h e i g h t = i m a g e . s h a p e [ 0 ]
  24. 24. (Num)Focus: growcut
  25. 25. Academic Results Pythran: Enabling Static Optimization of Scientific Python Programs, S. Guelton, P. Brunet et al. in CSD, 2015 Exploring the Vectorization of Python Constructs Using Pythran and Boost SIMD, S. Guelton, J. Falcou and P. Brunet, in WPMVP, 2014 Compiling Python modules to native parallel modules using Pythran and OpenMP Annotations, S. Guelton, P. Brunet and M. Amini, in PyHPC, 2013 Pythran: Enabling Static Optimization of Scientific Python Programs, S. Guelton, P. Brunet et al. in SciPy, 2013
  26. 26. Powered by Strong Engineering Preprequisite for reproductible science 2773 test cases, incl. unit testing, doctest, Continuous integration (thx !) Peer-reviewed code Python2.7 and C++11 Linux, OSX (almost okay), Windows (on going) User and Developer doc: Hosted on Releases on PyPi: $ pip install pythran : $ apt-get install pythran Travis http://pythonhosted.org/pythran/ https://github.com/serge-sans-paille/pythran Custom Debian repo
  27. 27. We need more peons Pythonic needs a serious cleanup Typing module needs better error reporting OSX support is partial and Windows support is on-going numpy.randomand numpy.linalg
  28. 28. THE END AUTHORS Serge Guelton, Pierrick Brunet, Mehdi Amini, Adrien Merlini, Alan Raynaud, Eliott Coyac… INDUSTRIAL SUPPORT , , →You← CONTRIBUTE #pythran on FreeNode, pythran@freelists.org, GitHub repo Silkan NumScale

×