My PyData Amsterdam 2019 presentation.
Have you ever wanted to run your NumPy based code on multiple cores, or on a distributed system, or on your GPU? Wouldn't it be nice to do this without changing your code? We will discuss how NumPy's array protocols work, and provide a practical guide on how to start using them. We will also discuss how array libraries in Python may evolve over the next few years.
Right Money Management App For Your Financial Goals
The evolution of array computing in Python
1. The evolution of array
computing in Python
Ralf Gommers
PyData Amsterdam 2019
2. whoami
Maintainer of NumPy & SciPy (2010 -- )
NumFOCUS board member (2012 -- 2018)
Director of Quansight Labs (2019 -- )
You can find me at:
https://github.com/rgommers
rgommers@quansight.com
2
Public benefit division of Quansight,
providing a home for a “PyData Core
Team” and growing the community
3. A very brief history of array computing in Python
3
Numeric
Numarray
1995
2003
2006
2008
2012
2015 -- today
Sparse
… ?
6. Do try this at home
6
$ conda create -n pydata-ams
$ conda activate pydata-ams
$ conda install numpy dask jupyterlab
$ # If you have a NVIDIA GPU. Needs CUDA installed.
$ # CuPy 6.0.0 will be out soon, then conda-installable.
$ pip install --pre cupy-cuda100 # or cupy-90 for CUDA 9
$ python -c "import numpy; print(numpy.__version__)"
1.16.3
$ python -c "import cupy; print(cupy.__version__)"
6.0.0rc1
$ python -c "import dask; print(dask.__version__)"
1.2.0
$ export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1
7. The NumPy array protocols - goals
7
Separate NumPy API from NumPy “execution engine”
Allow other libraries (Dask, CuPy, PyTorch, …) to reuse
the NumPy API
Bigger picture: avoid or reduce ecosystem
fragmentation (we don’t want to see a reimplementation of SciPy for
PyTorch, SciPy for Tensorflow, etc.)
8. The NumPy array protocols - goals
Current state of N-dimensional arrays in Python
8
10. The NumPy array protocols - concept
10
Function body
(the “implementation”, defines
semantics)
Function signature (the API)
Example for one function - this can be a ufunc or a regular function.
11. The NumPy array protocols - concept
11
Function body
Function signature (the API)
In short: use the NumPy API, bring your own implementation
Function body
Function signature
Function signature (the API)
if input arg has __array_function__:
execute other_function
Function body
Function signature
==
12. Using array protocols in your own code
Suitable for code that uses NumPy functions and ndarray
methods.
Try CuPy if you need more performance on large arrays,
and Dask if you want a distributed array.
Let’s play with this in a notebook!
12
13. Limits to these array protocols
13
Only functions can be overridden. And not even all functions -- only the
ones with an array_like parameter. Important exceptions:
np.array, np.asarray, np.linspace, np.concatenate
14. NumPy’s roadmap - what’s next?
Interoperability
Roll out __array_function__,
handle subclasses better,
new protocols?
Extensibility
Easier custom dtypes
Performance
Ufunc optimizations,
more SIMD instructions,
...
np.random rewrite
Is about to be merged
Indexing
NEP 21 (oindex/vindex), for
more intuitive behavior.
Type annotations
PEP 484 / mypy compatible
annotations, see numpy-
stubs repo.
14
16. XND
Recreates the foundations of NumPy as a number of
smaller libraries. Plus:
Variable length strings
Ragged arrays
Categorical type
Missing data support
Easy custom dtypes
Automatic multi-threading
JIT compilation (via Numba)
16
18. xtensor
A C++ library for n-D arrays. Plus:
Lazy evaluation
Performance - very fast
Can operate on NumPy arrays
Python, Julia and R bindings
JIT compilation (via Pythran)
Built on top: rray (NumPy-like arrays for R)
18
20. Uarray
A more general solution than the NumPy array protocols
for building APIs with multiple backends.
Override any object: functions, classes, ufuncs, dtypes,
context managers, and more
Uses multiple dispatch rather than protocols.
20