Introducing MyGrad#

MyGrad is a lightweight library that adds automatic differentiation to NumPy – its only dependency is NumPy!

>>> import mygrad as mg
>>> import numpy as np

>>> x = mg.tensor([1., 2., 3.])  # like numpy.array, but supports backprop!
>>> f = np.sum(x * x)  # tensors work with numpy functions!
>>> f.backward() # triggers automatic differentiation
>>> x.grad  # stores [df/dx0, df/dx1, df/dx2]
array([2., 4., 6.])

Its primary goal is to make automatic differentiation an accessible and easy to use across the Python/NumPy ecosystem. As such, it strives to behave and feel exactly like NumPy so that users need not learn yet another array-based math library. You can pass MyGrad’s Tensor to NumPy’s functions in order to make them differentiable! Of the various modes and flavors of auto-diff, MyGrad supports backpropagation from a scalar quantity.

A Simple Application#

Let’s use mygrad to compute the derivative of \(f(x) = x^2\) evaluated at \(x = 3\) (which is \(\frac{df}{dx}\rvert_{x=3} = 2\times 3\)).

Tensor behaves nearly identically to NumPy’s ndarray, in addition to having the machinery needed to compute the analytic derivatives of functions. Suppose we want to compute this derivative at x = 3. We can create a 0-dimensional tensor (a scalar) for x and compute f(x):

>>> import mygrad as mg
>>> import numpy as np
>>> x = mg.tensor(3.0)
>>> f = np.square(x)  # mygrad's tensors can be passed into NumPy functions
>>> f
Tensor(9.0)

Invoking backward() on f instructs mygrad to trace through the computational graph that produced f and compute the derivatives of f with respect to all of its independent variables. Thus, executing f.backward() will compute \(\frac{df}{dx} = 2x\) at \(x=3\), and will store the resulting value in x.grad:

>>> f.backward()  # triggers computation of ``df/dx``
>>> x.grad  # df/dx = 2x = 6.0
array(6.0)

This is the absolute tip of the iceberg. mygrad can compute derivatives of multivariable composite functions of tensor-valued variables!

Gradient Descent with MyGrad#

Performing gradient descent on \(\mathscr{L}(w) = w ^ 2\)

w = mg.tensor(10.0)
learning_rate = 0.3
num_steps = 10
print(w)

for step_cnt in range(num_steps):
    ℒ = w ** 2    # compute L(w) (this also "nulls" any derivatives")
    ℒ.backward()  # compute derivative of L

    # Update w via gradient-step..
    # We do an augmented update on the underlying numpy-array
    # stored by `w`
    w.data -= learning_rate * w.grad
    print(w)

The following steps are printed out.. see that gradient descent leads us towards the minimum of \(w = 0\)

Tensor(10.)
Tensor(4.)
Tensor(1.6)
Tensor(0.64)
Tensor(0.256)
Tensor(0.1024)
Tensor(0.04096)
Tensor(0.016384)
Tensor(0.0065536)
Tensor(0.00262144)
Tensor(0.00104858)

Some Bells and Whistles#

mygrad supports all of NumPy’s essential features, including:

N-dimensional tensors that can be reshaped and have their axes transposed

creating and operating on views of tensors

in-place operations on tensors

vectorization

broadcasting

basic and advanced indexing (including all varieties of mixed indexing schemes) for both getting and setting items.

fully-fledged support for einsum (including broadcasting and traces)

Tensor plays nicely with NumPy-arrays, which behave as constants when they are used in computational graphs:

>>> import numpy as np
>>> x = mg.tensor([2.0, 2.0, 2.0])
>>> y = np.array([1.0, 2.0, 3.0])
>>> f = x ** y  # (2 ** 1, 2 ** 2, 2 ** 3)
>>> f.backward()
>>> x.grad
array([ 1.,  4., 12.])

nnet supplies essential functions for machine learning, including:

N-dimensional convolutions (with striding, dilation, and padding)
N-dimensional pooling
A gated recurrent unit for sequence-learning (with input-level dropout and variational hidden-hidden dropout)

It leverages a nice sliding window view function, which produces convolution-style windowed views of arrays/tensors without making copies of them, to intuitively (and quite efficiently) perform the neural network-style convolutions and pooling.

Advanced Example#

The following is an example of using mygrad to compute the hinge loss of classification scores and to “back-propagate” through (compute the gradient of) this loss. This example demonstrates some of mygrad’s ability to perform back-propagation through broadcasted operations, basic indexing, advanced indexing, and in-place assignments.

>>> from mygrad import Tensor
>>> import numpy as np
>>> class_scores = Tensor(10 * np.random.rand(100, 10))         # 100 samples, 10 possible classes for each
>>> class_labels = np.random.randint(low=0, high=10, size=100)  # correct label for each datum
>>> class_labels = (range(len(class_labels)), class_labels)
>>> correct_class_scores = class_scores[class_labels]

>>> Lij = class_scores - correct_class_scores[:, np.newaxis] + 1. 0 # 100x10 margins
>>> Lij[Lij <= 0] = 0      # scores within the hinge incur no loss
>>> Lij[class_labels] = 0  # the score corresponding to the correct label incurs no loss

>>> loss = Lij.sum() / class_scores.shape[0]  # compute mean hinge loss
>>> loss.backward()    # compute gradient of loss w.r.t all dependent tensors
>>> class_scores.grad  # d(loss)/d(class_scores)
array([[ 0.  ,  0.01,  0.  , -0.04,  0.  ,  0.  ,  0.01,  0.  ,  0.01, 0.01], ...])

Computational Graph Visualization#

MyGrad provides the capability to visually render diagrams of your computational graphs:

import mygrad as mg
from mygrad.computational_graph import build_graph
x = mg.tensor(2)
y = mg.tensor(3)
f = x * y
g = f + x - 2

build_graph(g, names=locals())

mygrad uses Graphviz and a Python interface for Graphviz to render the computational graphs built using tensors. These graphs can be rendered in Jupyter notebooks, allowing for quick checks of graph structure, or can be saved to file for later reference.

The dependencies can be installed with:

conda install graphviz
conda install python-graphviz

Big thanks to Petar Griggs for implementing these fantastic viz capabilities!