ChainerX and How to Take Part

ChainerX
and How to Take Part
Hiroyuki Vincent Yamazaki, @hvy @ Preferred Networks.
Mar. 30, 2019.
Chainer Meetup #09 @ Preferred Networks.

What makes
a modern deep learning framework?

• Speed
• Fast trial-and-error
• Fast training and inference
• Environment Support
• Quick adoption of new hardwares/environments
• Quick Deployment
• Quick application of research outcome
Chainer

• Speed
• Fast trial-and-error
• Fast training and inference
• Environment Support
• Quick adoption of new hardwares/environments
• Quick Deployment
• Quick application of research outcome
Chainer
ChainerX

• how it makes Chainer a modern deep learning framework
• how it started and where it is heading
• how to contribute to it
This talk is about ChainerX and...

• understand ChainerX and some of its internals
• are ready to try ChainerX
• be curious to modify it to your needs
You hopefully after this talk...

What is ChainerX?
A NumPy-like ndarray library with autograd,
built from scratch with experiences from Chainer

• Subproject of Chainer started in late 2017
• With both internal and external Chainer developers
• Merged into master as of v6.0.0b1 and will be included in v6
https://github.com/chainer/chainer/tree/master/chainerx
https://github.com/chainer/chainer/tree/master/chainerx_cc
How it started
@beam2d @niboshi @asi1024 @hvy @sonots @takagi

import chainerx as chx
# Array creation, chx.ndarray, similar to NumPy
x = chx.ones((2, 3), dtype=chx.float32, device='native')
# Flag to record computational graph
x.require_grad()
# Define-by-run/eager forward pass, again similar to NumPy
y = chx.exp(x + 1).sum()
# Backpropagation
chx.backward(y)
# Computed gradient is also a chx.ndarray
gx = x.grad

chainerx.add
chainerx.amax
chainerx.arange
chainerx.argmax
chainerx.array
chainerx.asanyarray
chainerx.asarray
chainerx.ascontiguousarray
chainerx.average_pool
chainerx.batch_norm
chainerx.broadcast_to
chainerx.clip
chainerx.concatenate
chainerx.conv
chainerx.conv_transpose
chainerx.copy
chainerx.diag
chainerx.diagflat
chainerx.divide
chainerx.dot
chainerx.empty
chainerx.empty_like
chainerx.equal
chainerx.exp
chainerx.eye
chainerx.fixed_batch_norm
chainerx.floor_divide
chainerx.frombuffer
chainerx.fromfile
chainerx.fromfunction
chainerx.fromiter
chainerx.fromstring
chainerx.full
chainerx.full_like
chainerx.greater
chainerx.greater_equal
chainerx.hstack
chainerx.identity
chainerx.isfinite
chainerx.isinf
chainerx.isnan
chainerx.less
chainerx.less_equal
chainerx.linear
chainerx.linspace
chainerx.loadtxt
chainerx.log
chainerx.log_softmax
chainerx.logical_not
chainerx.logsumexp
chainerx.max
chainerx.max_pool
chainerx.maximum
chainerx.minimum
chainerx.multiply
chainerx.ndarray
chainerx.negative
chainerx.not_equal
chainerx.ones
chainerx.ones_like
chainerx.ravel
chainerx.relu
chainerx.reshape
chainerx.sigmoid
chainerx.split
chainerx.sqrt
chainerx.square
chainerx.squeeze
chainerx.stack
chainerx.subtract
chainerx.sum
chainerx.take
chainerx.tanh
chainerx.to_numpy
chainerx.transpose
chainerx.true_divide
chainerx.vstack
chainerx.zeros
chainerx.zeros_like
chainerx.activation
chainerx.creation
chainerx.random
chainerx.manipulation
chainerx.math
chainerx.dtype
chainerx.bool
chainerx.bool_
chainerx.float
chainerx.float16
chainerx.float32
chainerx.float64
chainerx.int
chainerx.int16
chainerx.int32
chainerx.int64
chainerx.int8
chainerx.uint8
chainerx.all_dtypes
chainerx.Context
chainerx.ContextScope
chainerx.Backend
chainerx.BackpropId
chainerx.BackpropScope
chainerx.Device
chainerx.DeviceScope
chainerx.ForceBackpropMode
chainerx.NoBackpropMode
chainerx.grad
chainerx.backprop_scope
chainerx.backward
chainerx.check_backward
chainerx.check_double_backward
chainerx.context_scope
chainerx.force_backprop_mode
chainerx.get_backend
chainerx.get_default_context
chainerx.get_default_device
chainerx.get_device
chainerx.is_available
chainerx.is_backprop_required
chainerx.no_backprop_mode
chainerx.set_default_context
chainerx.using_device
chainerx.newaxis
…

Why ChainerX?
Speed, environment support and quick deployment

• Written in C++
• Speed
• No Python runtime
required for deployment
• Python binding on top
• Lightweight
• 1-to-1 C++ mappings
• Pluggable backends
• Extensible to new
hardwares/environments
Autograd
Backpropable ndarray
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Backend/Device interface
Custom
Backend/
Device
...

#include "chainerx.h"
namespace chx = chainerx;
chx::Array x = chx::Ones(
{2, 3}, chx::Dtype::kFloat32,
chx::GetDevice("native"));
x.RequireGrad();
chx::Array y = chx::Exp(x + 1).Sum();
chx::Backward(y);
chx::Array gy = *x.GetGrad();
C++ API
x = chx.ones(
(2, 3), dtype=chx.float32,
device='native')
x.require_grad()
y = chx.exp(x + 1).sum()
chx.backward(y)
gx = x.grad
Python API

ChainerX internals
Explaining basic types and functions

// Call a routine to create a graph.
Internally uses chx::BackwardBuilder to do so
chx::Array y =
chx::Conv(x, w, b, {1, 1}, {1, 1});
Array, x
ArrayBody
Array, w
ArrayBody
Array, b
ArrayBody
ArrayNode ArrayNode ArrayNode
OpNode, Conv
Array, y
ArrayBody
ArrayNode
chainerx namespace omitted for clarity
// Flag to record computational graph
x.RequireGrad();
w.RequireGrad();
b.RequireGrad();
// Create input ndarrays
chx::Array x = ...
chx::Array w = ...
chx::Array b = ...

chainerx::Array (chainerx::ArrayBody)
• Core data type in ChainerX, an ndarray with autograd
• Has ndarray properties such as
• pointer to allocated data,
shape, dtype, strides
• Associated with a single device
• Data resides on e.g. "native" or "cuda:2"
• Holds references to its
• gradients, also chainerx::Arrays
• nodes in the computational graphs
Array, x
device
ArrayBody
data
Array, gx
ArrayNode
ArrayBody

chainerx::ArrayNode
• A node representing an
array in the
computational graph
• Owned by
chainerx::ArrayBody
Array, x
ArrayBody
Array, w
ArrayBody
Array, b
ArrayBody
OpNode, Conv
Array, y
ArrayBody
ArrayNode

chainerx::OpNode
• A node representing an
operation in the
computational graph
• Referenced by
chainerx::ArrayNode
Array, x
ArrayBody
Array, w
ArrayBody
Array, b
ArrayBody
OpNode, Conv
Array, y
ArrayBody
ArrayNode

• An array is constructed by specifying the allocating device
chainerx::Device& gpu = chainerx::GetDevice("cuda:0");
chainerx::Array x =
chainerx::Ones({2, 3}, chainerx::Dtype::kFloat32, gpu);
• A device defines
• how memory is allocated and freed
• chainerx::Device::Allocate
• operations on data
• chainerx::Device::{
Fill,Arange,Add,Subtract,Multiply,Divide,Sum,Dot,...}
chainerx::Device (1/2)

chainerx::Device (2/2)
• chainerx::Device is an interface
• Concrete implementations provided by ChainerX
• chainerx::native::NativeDevice
• chainerx::cuda::CudaDevice
• Can be implemented for other devices and dynamically loaded as
shared libraries

Routines (1/2)
• Backpropable autograd operations on chainerx::Arrays
• chainerx::{
Add,Subtract,Multiply,Divide,
Sum,Transpose,Reshape,Dot,
Conv,BatchNorm,MaxPool,...}

Routines (2/2)
• Defines forward and backward
logic using
chainerx::BackwardBuilder
• Delegates actual computations
to the device methods
• chainerx::Dot calls
chainerx::Device::Dot
Array Dot(const Array& a, const Array& b, Dtype dtype) {
int64_t m = a.shape()[0];
int64_t k = a.shape()[1];
int64_t n = b.shape()[1];
Array out = Empty({m, n}, dtype, a.device());
{
NoBackpropModeScope scope{};
a.device().Dot(a, b, out);
}
{
BackwardBuilder bb{"dot", {a, b}, out};
if (BackwardBuilder::Target bt = bb.CreateTarget(0)) {
bt.Define([b_tok = bb.RetainInput(1), a_dtype = a.dtype()](BackwardContext& bctx) {
const Array& b = bctx.GetRetainedInput(b_tok);
bctx.input_grad() = Dot(*bctx.output_grad(), b.Transpose(), a_dtype);
});
}
if (BackwardBuilder::Target bt = bb.CreateTarget(1)) {
bt.Define([a_tok = bb.RetainInput(0), b_dtype = b.dtype()](BackwardContext& bctx) {
const Array& a = bctx.GetRetainedInput(a_matrix_tok);
bctx.input_grad() = Dot(a.Transpose(), *bctx.output_grad(), b_dtype);
});
}
bb.Finalize();
}
return out;
}

Chainer integration
How ChainerX can be used from Chainer

Architecture
Variable and functions APIs
Autograd
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Custom
Backend/
Device
...
Training and model APIs
CuPy
Autograd
NumPy
• Various APIs in Chainer
v6 work with and utilize
chainerx
• Variable and
FunctionNode
delegates autograd
computations to ChainerX

Chainer
import chainer as ch
import cupy as cp
class ResNet50(ch.Chain):
…
model = ResNet50()
model.to_device(0)
arr = cp.array(...)
x = ch.Variable(arr)
y = model(x)
loss = …
loss.backward()
Autograd
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Custom
Backend/
Device
...
CuPy
CuPy
Autograd
NumPy

Chainer
on ChainerX
import chainer as ch
class ResNet50(ch.Chain):
…
model = ResNet50()
model.to_device('cuda:0')
arr = chx.array(...)
x = ch.Variable(arr)
y = model(x)
loss = …
loss.backward()
CuPy
CuPy
Autograd
NumPy
Autograd
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Custom
Backend/
Device
...

How to take part in developing ChainerX
Contribution guide explained

It’s all documented
• A section in the Chainer documentation
https://docs.chainer.org/en/latest/chainerx/index.html
• On GitHub
• Look for issues/PRs labeled
• ChainerX needs to support more routines
• A list of unimplemented routines
https://github.com/chainer/chainer/issues/6423
contribution-welcomeChainerX

Future roadmap
• Integrate into Chainer
• Wider range of supported routines
• Dynamic device operation registration
• Concrete third party backends
• Stable C++ interface
• Wider coverage of “compiled models”

Summary
ChainerX is implemented in C++ with far less host-side
overhead, made accessible to Python-free deployments and
allows third parties to implement backends and devices for
hardware/environment support
Taking Chainer to the next level
by being accessible via Python and used by Chainer

and you can take part of ChainerX on GitHub
Contributions, ideas and discussions are welcome
• Follow @ChainerOfficial on Twitter
• Join chainer on Slack
• Job application to https://www.preferred-networks.jp/en/jobs
We are hiring

Additional resources
• ChainerX documentation
• ChainerX Product Backlog
• ChainerX examples (MLP, ResNet50)
• ChainerX Python bindings
• ChainerX C++ Backpropagation

ChainerX and How to Take Part

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ChainerX and How to Take Part

Similar to ChainerX and How to Take Part (20)

Recently uploaded

Recently uploaded (20)

ChainerX and How to Take Part