Python is sometimes discounted as slow because of its dynamic typing and interpreted nature and not suitable for scale because of the GIL. But, in this talk, I will show how with the help of talented open-source contributors around the world, we have been able to build systems in Python that are fast and scalable to many machines and how this has helped Python take over Science.
2. ⢠MS/BS degrees in Elec. Comp. Engineering
⢠PhD in Biomedical Engineering (Ultrasound and
MRI)
⢠Creator and Maintainer of SciPy (1998-2009)
⢠Professor of EE (2001-2007) Inverse Problems
⢠Creator and Developer of NumPy (2005-2013)
⢠Started Numba and Conda (2012)
⢠Founder of NumFOCUS / PyData
⢠Python Dev and Foundation Director
⢠CEO/Founder (2012) of Continuum Analytics/
Anaconda, Inc.
⢠CEO/Founder (2018) of Quansight
⢠CEO/Founder (2019) of OpenTeams
SciPy
3. Started my career in computational science
Satellites Measure Backscatter
Computer Algorithms Produce
Estimate of Earth Features
⢠Wind Speed
⢠Ice Cover
⢠Vegetation
⢠(and more)
4. 1996 - 2001
Analyze 12.0
https://analyzedirect.com/
Richard Robb
Retired in 2015
Bringing âSciFiâ
Medicine to Life
since 1971
5. More Science led to Python
Raja Muthupillai
Armando Manduca
Richard Ehman
1997
13. Used by some of the Brightest MindsâŚ
LIGO : Gravitional Waves
Higgs Boson
Discovery
Black Hole
Imaging
14. Scientists have big data and compute needs
Data Size
60 million GB
Compute Power
2000 TeraFLOPS
(~30,000 of my laptop)
How does Python
scale to that?
15. GTC Europe
NVIDIA CEO Jensen
Huang describes Python
as the DeFacto Data
Science Platform
Getting better supported
by GPUs over the next
months and years.
16. Reasons for Pythonâs success in Science
1) Python is expressive and easy to read.
2) Python (in particular CPython) is straightforward to
extend and with Cython, Python has become a glue
language for many other run-times.
When you use Python for speed, scale, and science you are almost always
actually running machine instructions compiled from another language
âgluedâ together with high-level and expressive Python.
Pythonic code helps me think better. For a scientist it gets out of your way
3) An engaging Open Source Community
17. Not all open-source is the same!
Community-Driven
Open Source
Software (CDOSS)
Company-Backed
Open Source
Software (CBOSS)
⢠Anyone can become the leader.
⢠Multiple-stake holders.
⢠Can look at community size for health.
⢠Users become contributors more often.
⢠Examples:
⢠Python
⢠Jupyter
⢠NumPy
⢠SciPy
⢠Pandas
⢠Need to work at a company to be the
leader,
⢠Many users, fewer developers
⢠Need to understand incentive of company
to understand health
⢠Examples:
⢠Swift
⢠Tensorflow
⢠PyTorch
⢠Conda
Both can be valuable, but have different implications!
Governance
models
19. My First Big Project
Started as Multipack in 1998 and became
SciPy in 2001 with the help of other
colleagues
115 releases, 854 contributors
Used by: 187,386
20. SciPy
âDistribution of Python Numerical Tools masquerading as one Libraryâ
Name Description
cluster KMeans and Vector Quantization
fftpack Discrete Fourier Transform
integrate Numerical Integration
interpolate Interpolation routines
io Data Input and Output
linalg Fast Linear algebra
misc Utilities
ndimage N-dimensional Image processing
Name Description
odr Orthogonal Distance Regression
optimize
Constrained and Unconstrained
Optimization
signal Signal Processing Tools
sparse Sparse Matrices and Algebra
spatial Spatial Data Structures and Algorithms
special Special functions (e.g. Bessel)
stats Statistical Functions and Distributions
22. My Open Source
addiction continuedâŚ
Gave up my chance at tenured academic position
in 2005-2006 to bring together the diverging
array community in Python and bring Numeric
and Numarray together.
170 releases, 923 contributors
Used by: 378,828
23. Without NumPy
from math import sin, pi
def sinc(x):
if x == 0:
return 1.0
else:
pix = pi*x
return sin(pix)/pix
def step(x):
if x > 0:
return 1.0
elif x < 0:
return 0.0
else:
return 0.5
functions.py
>>> import functions as f
>>> xval = [x/3.0 for x in
range(-10,10)]
>>> yval1 = [f.sinc(x) for x
in xval]
>>> yval2 = [f.step(x) for x
in xval]
Python is a great language but
needed a way to operate quickly
and cleanly over multi-
dimensional arrays.
24. With NumPy
from numpy import sin, pi
from numpy import vectorize
import functions as f
vsinc = vectorize(f.sinc)
def sinc(x):
pix = pi*x
val = sin(pix)/pix
val[x==0] = 1.0
return val
vstep = vectorize(f.step)
def step(x):
y = x*0.0
y[x>0] = 1
y[x==0] = 0.5
return y
>>> import functions2 as f
>>> from numpy import *
>>> x = r_[-10:10]/3.0
>>> y1 = f.sinc(x)
>>> y2 = f.step(x)
functions2.py
Offers N-D array, element-by-element
functions, and basic random numbers,
linear algebra, and FFT capability for
Python
http://numpy.org
Fiscally sponsored by NumFOCUS
25. NumPy: an Array Extension of Python
⢠Data: the array object
â slicing and shaping
â data-type map to Bytes
⢠Fast Math (ufuncs):
â vectorization
â broadcasting
â aggregations
29. Summary
⢠Provides foundational N-dimensional array composed of
homogeneous elements of a particular âdtypeâ
⢠The dtype of the elements is extensive (but difficult to extend)
⢠Arrays can be sliced and diced with simple syntax to provide
easy manipulation and selection.
⢠Provides fast and powerful math, statistics, and linear algebra
functions that operate over arrays.
⢠Utilities for sorting, reading and writing data also provided.
31. Scale Up vs Scale Out
Big Memory &
Many Cores
/ GPU Box
Best of Both
(e.g. GPU Cluster)
Many commodity
nodes in a cluster
ScaleUp
(BiggerNodes)
Scale Out
(More Nodes)
Numba
Dask
Dask with Numba
32. ⢠Python is one of the most popular languages for data science
⢠Python integrates well with compiled, accelerated libraries (MKL,
TensorFlow, etc)
⢠But what about custom algorithms and data processing tasks?
⢠Our goal was to make a compiler that:
⢠Worked within the standard Python interpreter, not replaced it
⢠Integrated tightly with NumPy
⢠Compatible with both multithreaded and distributed computing
paradigms
A Compiler for Python?
Combining Productivity and Performance
33. ⢠An open-source, function-at-a-time compiler library for Python
⢠Compiler toolbox for different targets and execution models:
⢠single-threaded CPU, multi-threaded CPU, GPU
⢠regular functions, âuniversal functionsâ (array functions), etc
⢠Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
⢠Combine ease of writing Python with speeds approaching C/FORTRAN
⢠Empower data scientists who make tools for themselves and other data
scientists
Numba: A JIT Compiler for Python
34. 7 things about Numba you may not know
1
2
3
4
5
6
7
Numba is 100% Open Source
Numba + Jupyter = Rapid
CUDA Prototyping
Numba can compile for the
CPU and the GPU at the same time
Numba makes array processing
easy with @(gu)vectorize
Numba comes with a
CUDA Simulator
You can send Numba
functions over the network
Numba developers contributing to
NVIDIA new rapids.ai work.
35. Numba (compile Python to CPUs and GPUs)
conda install numba
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
LLVMNumba
Code Generation
Backend
Parsing
Frontend
36. How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execute!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
âŚ
>>> do_math(x, y)
37. Supported Platforms and Hardware
OS HW SW
Windows
(7 and later)
32 and 64-bit CPUs (Incl
Xeon Phi)
Python 2.7, 3.4-3.7
OS X
(10.9 and later)
CUDA & HSA GPUs NumPy 1.10 and later
Linux
(RHEL 6 and later)
Some support for ARM and
ROCm
39. Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator
(nopython=True not required)
40. ⢠Detects CPU model during compilation and optimizes for that target
⢠Automatic type inference: No need to give type signatures for functions
⢠Dispatches to multiple type-specializations for the same function
⢠Call out to C libraries with CFFI and types
⢠Special "callback" mode for creating C callbacks to use with external
libraries
⢠Optional caching to disk, and ahead-of-time creation of shared libraries
⢠Compiler is extensible with new data types and functions
Numba Features
41. ⢠Numba's CPU detection will enable
LLVM to autovectorize for
appropriate SIMD instruction set:
⢠SSE, AVX, AVX2, AVX-512
⢠Will become even more important
as AVX-512 is now available on
both Xeon Phi and Skylake Xeon
processors
SIMD: Single Instruction Multiple Data
42. Manual Multithreading: Release the GIL
SpeedupRatio 0
0.9
1.8
2.6
3.5
Number of Threads
1 2 4
Option to release the GIL
Using Python
concurrent.futures
43. Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented
computing.
⌠A function with scalar inputs is broadcast across the elements of
the input arrays:
⢠np.add([1,2,3], 3) == [4, 5, 6]
⢠np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
⌠Parallelism is present, by construction. Numba will generate
loops and can automatically multi-thread if requested.
⌠Before Numba, creating fast ufuncs required writing C. No
longer!
45. Multi-threaded Ufuncs
Specify type signature
Select parallel target
Automatically uses all CPU cores!
target=âcudaâ and âhsaâ for easily using multiple cores on a GPU available too!
46. Other Numba topics
CUDA Python â write general NVIDIA GPU kernels with Python
Device Arrays â manage memory transfer from host to GPU
Streaming â manage asynchronous and parallel GPU compute streams
CUDA Simulator in Python â to help debug your kernels
HSA â support for AMD ROCm GPUs and APUs
Pyculib â access to cuFFT, cuBLAS, cuSPARSE, cuRAND, CUDA Sorting
https://github.com/ContinuumIO/gtc2017-numba
48. ⢠Designed to parallelize the Python ecosystem
⢠Handles complex algorithms
⢠Co-developed with Pandas/SKLearn/Jupyter teams
⢠Familiar APIs for Python users
⢠Scales
⢠Scales from multicore to 1000-node clusters
⢠Resilience, responsive, and real-time
49. ⢠Parallelizes NumPy, Pandas, SKLearn
⢠Satisfies subset of these APIs
⢠Uses these libraries internally
⢠Co-developed with these teams
⢠Task scheduler supports custom algorithms
⢠Parallelize existing code
⢠Build novel real-time systems
⢠Arbitrary task graphs
with data dependencies
⢠Same scalability
53. Example 1: Using Dask DataFrames on a cluster with CSV
data
53
⢠Built from Pandas DataFrames
⢠Match Pandas interface
⢠Access data from HDFS, S3, local, etc.
⢠Fast, low latency
⢠Responsive user interface
55. Example 3: Using Dask Arrays with global temperature data
55
⢠Built from NumPy
n-dimensional arrays
⢠Matches NumPy interface
(subset)
⢠Solve medium-large
problems
⢠Complex algorithms
57. ⢠Scheduling arbitrary graphs is hard.
⢠Optimal graph scheduling is NP-hard
⢠Scalable Scheduling requires Linear time solutions
⢠Fortunately dask does well with a lot of heuristics
⢠⌠and a lot of monitoring and data about sizes
⢠⌠and how long functions take.
Dask Scheduler
57
60. ⢠Dask is not a SQL database.
Does Pandas well, but wonât optimize complex queries.
⢠Dask is not MPI
Very fast, but does leave some performance on the table
200us task overhead
a couple copies in the network stack
⢠Dask is not a JVM technology
Itâs a Python library
(although Julia bindings available)
⢠Dask is not always necessary
You may not need parallelism
Reasons not to Use: Daskâs limitations
62. Downloads
49 Million
Estimated Cost
$7.57 Million
Contributors
866
Estimated Effort
76 person-years
4
Current Maintainers
Downloads
27.7 Million
Estimated Cost
$7 Million
Contributors
1,666
Estimated Effort
70 person-years
3
Current Maintainers
Downloads
13.8 Million
Estimated Cost
$6.63 Million
Contributors
860
Estimated Effort
64 person-years
2
Current Maintainers
Development began in 2003
Development began in 2005
Development began in 2008
The original developers were not paid to work on or improve these libraries!
Improving with
QLabs
63. What is next for me? What
am I working on for the next
few yearsâŚ
64. High Level APIs for Arrays (Tensors),
DataFrames, and DataTypes
LABS
66. What will work!
⢠Create a statically typed subset
of Python that is then used to
extend Python â EPython
⢠Port NumPy, SciPy, Scikits to
EPython (borrow heavily from
Cython ideas but use mypy-style
typing instead of new syntax).
67. LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workďŹow.
⢠NumPy ecosystem maintenance (PyData Core Team)
⢠Improve connection of NumPy to ML Frameworks
⢠GPU Support for NumPy Ecosystem
⢠Improve foundations of Array computing
⢠JupyterLab and JupyterHub
⢠Data Catalog standards
⢠Packaging (conda-forge, PyPA, etc.)
PySparse - sparse n-d arrays
Ibis - Pandas-like front-end to SQL
uarray â unified array interface for SciPy refactor
xnd â re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Collaborating with NumFOCUS!
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote
68. Problem
Open Source Teams
â Burned out
â Underrepresented
â Underpaid
Organizations
â Disconnected from
the Community
â Lack support and
maintenance
Thereâs no easy way to connect the
community with organizations
69. Marketplace for Open Source Services
Partners
â Provide Open Source Services
â Training / Support
â Feature development / fixes
â Hire Open Source Devs
Clients
â Pay for support
â Pay for training and
mentoring
â Get support they need to
build effectively on open-
source.
Open-source Contributors create
profiles for themselves and manage
their reputation to get hired or work
with both!
70. FairOSS
A Public Benefit Company (goal is growing amount of freely available software)
⢠Owned by open-source contributors (will be doing a public fund-raise later this year)
⢠Those share-holders govern the organization (elect the board).
⢠Board appoints management and decides what is âfairâ
Holds Companies accountable
⢠Allows usage of its trademarks only for companies that contribute back âfairlyâ
⢠Think âKosherâ or âOrganic labeledâ
⢠Companies give back by equity, revenue, and âin-kindâ agreements with FairOSS
FairOSS is custodian of Revenue and Equity Agreements
⢠Equity agreements mean that FairOSS holds shares, options, or warrants of the company
(most companies are missing open-source community from their âcap-tableâ)
⢠Revenue agreements mean that companies pay FairOSS a portion of their revenue.
⢠FairOSS distributes almost all of the proceeds from these agreements to the open-
source communities.
If successful â this would make OpenSource investable and
make available >$45,000,000,000,000 (trillion) of investment
capital to open-source communities.
71. You can really change the worldâŚ
With Open Source CommunitiesâŚ
Letâs do more of that!