Talk given at GTC Fall 2021.
The Python array API standard, which was first announced towards the end of 2020, is maturing and becoming available to Python end users. NumPy now has a reference implementation, PyTorch support is close to complete, and other libraries have started to implement support. In this talk we will discuss the current state of implementations, and look at a concrete use case of moving a scientific analysis workflow to using the API standard - thereby gaining access to GPU acceleration.
4. Consortium for Python Data API Standards
Coordination:
Sponsors:
Participation: maintainers of all of the most popular Python array and dataframe libraries
6. The array API standard
Use cases
Scope & purpose
Stakeholders
Portable test suite
API surface:
● ~125 functions, largely common to
n-dimensional array libraries
● Array object: dtypes, indexing,
broadcasting,
Spec: github.com/data-apis/array-api/
7. Goals for and scope of the array API
Syntax and semantics of functions
and objects in the API
Casting rules, broadcasting, indexing,
Python operator support
Data interchange & device support
Execution semantics (e.g. task
scheduling, parallelism, lazy eval)
Non-standard dtypes, masked arrays,
I/O, subclassing array object, C API
Error handling & behaviour for invalid
inputs to functions and methods
Goal 1: enable writing code & packages that support multiple array libraries
Goal 2: make it easy for end users to switch between array libraries
In Scope Out of Scope
8. Use case: the einops package
● A popular package for array manipulation
● Supports 8 popular array/tensor libraries.
● Almost 50% of the code can be removed through array API standardization!
9. Array- and array-consuming libraries
Using DLPack, will work for any two
libraries if they support device the
data resides on
x = xp.from_dlpack(x_other)
Data interchange between array libs
Portable code in array-consuming libs
def softmax(x):
# grab standard namespace from
# the passed-in array
xp = get_array_api(x)
x_exp = xp.exp(x)
partition = xp.sum(x_exp, axis=1,
keepdims=True)
return x_exp / partition
10. Array API - participation & adoption
In numpy.array_api namespace
API adoption done
or close to done
Design participation,
adoption in progress or being discussed
In cupy.array_api namespace
In torch (main) namespace
11. Demo: moving LIGO analysis to PyTorch
https://quansight-labs.github.io/array-api-demo/, work by Anirudh Dagar
LIGO = Laser Interferometer Gravitational-Wave Observatory
13. What is next? — array API standard
1. Finalize the 2021 standard (November ‘21)
2. Maturing of implementations & first usage downstream
(SciPy, scikit-learn, scikit-image, domain-specific libraries)
3. Extensions for 2022 standard:
defined: complex dtypes, fft extension, more linear algebra
TBD: parallelism & improved support for new device types, … ?
16. How can you help?
Give feedback! Is your use case covered? See a small gap in functionality?
Contribute! Portable test & benchmarking suites, remaining design issues
Implement! The standard is complete enough to adopt today (draft mode)
Spread awareness! Blog, reference in your talk, ...
Support! Funding or engineering time -- lots more to do, also for dataframes
17. Consortium:
● Website & introductory blog posts: data-apis.org
● Array API main repo: github.com/data-apis/array-api
● Latest version of the standard: data-apis.github.io/array-api/latest
● Dataframe protocol: github.com/data-apis/dataframe-api
● Members: github.com/data-apis/governance
Find me at: rgommers@quansight.com, rgommers, ralfgommers
To learn more