MyGrad#

MyGrad is a lightweight library that adds automatic differentiation to NumPy – its only dependency is NumPy. Simply “drop in” a MyGrad tensor into your NumPy-based code, and start differentiating!

>>> import mygrad as mg
>>> import numpy as np

>>> x = mg.tensor([1., 2., 3.])  # like numpy.array, but supports backprop
>>> f = np.sum(x * x)  # tensors can be passed directly to native numpy functions!
>>> f.backward() # triggers automatic differentiation
>>> x.grad  # stores [df/dx0, df/dx1, df/dx2]
array([2., 4., 6.])

MyGrad’s primary goal is to make automatic differentiation accessible and easy to use across the Python/NumPy ecosystem. As such, it strives to behave and feel exactly like NumPy so that users need not learn yet another array-based math library.

Of the various modes and flavors of auto-diff, MyGrad currently only supports back-propagation from a scalar quantity.

“Drop in” automatic differentiation?#

What we mean by drop in automatic differentiation is that you can take a third party function, which is written in NumPy, and pass MyGrad tensors as its inputs – this will coerce it into using MyGrad functions internally so that we can differentiate the function.

What we mean by drop in autodiff#
from third_party_lib import some_numpy_func

import mygrad as mg

arr1 = mg.tensor(...) # some MyGrad Tensor (instead of a NumPy array)
arr2 = mg.tensor(...) # some MyGrad Tensor (instead of a NumPy array)

output = some_numpy_func(arr1, arr2)  # "drop in" the MyGrad tensors

output.backward()  # output is a MyGrad tensor, not a NumPy array!

arr1.grad  # stores d(some_numpy_func) / d(arr1)
arr2.grad  # stores d(some_numpy_func) / d(arr2)

MyGrad aims for parity with NumPy’s major features#

NumPy’s ufuncs are richly supported. We can even differentiate through an operation that occur in-place on a tensor and applies a boolean mask to the results:

>>> x = mg.tensor([1., 2., 3.])
>>> y = mg.zeros_like(x)
>>> np.multiply(x, x, where=[True, False, True], out=y)
>>> y.backward()
>>> x.grad
array([2., 0., 6.])

NumPy’s view semantics are also mirrored to a high fidelity: performing basic indexing and similar operations on tensors will produce a “view” of that tensor’s data, thus a tensor and its view share memory. This relationship will also manifest between the derivatives stored by a tensor and its views!

>>> x = mg.arange(9.).reshape(3, 3)
>>> diag_view = np.einsum("ii->i", x)  # returns a view of the diagonal elements of `x`
>>> x, diag_view
(Tensor([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]]),
Tensor([0., 4., 8.]))

# views share memory
>>> np.shares_memory(x, diag_view)
True

# mutating a view affects its base (and all other views)
>>> diag_view *= -1  # mutates x in-place
>>> x
Tensor([[-0.,  1.,  2.],
        [ 3., -4.,  5.],
        [ 6.,  7., -8.]])

>>> (x ** 2).backward()
>>> x.grad, diag_view.grad
(array([[ -0.,   2.,   4.],
        [  6.,  -8.,  10.],
        [ 12.,  14., -16.]]),
 array([ -0.,  -8., -16.]))

# the gradients have the same view relationship!
>>> np.shares_memory(x.grad, diag_view.grad)
True

Basic and advanced indexing is fully supported

>>> (x[x < 4] ** 2).backward()
>>> x.grad
array([[0., 2., 4.],
       [6., 0., 0.],
       [0., 0., 0.]])

NumPy arrays and other array-likes play nicely with MyGrad’s tensor. These behave like constants during automatic differentiation

>>> x = mg.tensor([1., 2., 3.])
>>> constant = [-1., 0., 10]  # can be a numpy array, list, or any other array-like
>>> (x * constant).backward()  # all array-likes are treated as constants
>>> x.grad
array([-1.,  0., 10.])

What About JAX?#

Doesn’t JAX already provide drop in automatic differentiation? Not quite; JAX provides swap-out automatic differentiation: you must swap out the version of NumPy you are using before you write your code. Thus you cannot simply differentiate some third party function by passing it a JAX array.

“Is MyGrad a competitor to JAX? Should I stop using JAX and start using MyGrad?”

Goodness gracious, no! MyGrad is not meant to compete with the likes of JAX, which offers far more functionality in the way of computing higher-order derivatives, Jacobian vector projects, in terms of providing a jit… this list goes on. MyGrad is meant to be a simple and highly accessible way to provide basic automatic differentiation capabilities to the NumPy ecosystem. Anyone who knows how to use NumPy can very easily learn to use MyGrad. It is especially great for teaching. But once your auto-diff needs extend beyond derivatives of scalars, it is time to graduate to JAX.

Installing MyGrad#

MyGrad requires numpy. It is highly recommended that you utilize numpy built with MKL for access to optimized math routines (e.g. install numpy via anaconda). You can install MyGrad using pip:

pip install mygrad

You can instead install MyGrad from its source code. Clone this repository and navigate to the MyGrad directory, then run:

pip install .

Support for Python and NumPy#

MyGrad abides by the NEP 29 recommendation, and adopts a common “time window-based” policy for support of NumPy versions. Accordingly, MyGrad’s drop schedule for NumPy versions can be found here.

Note, however, that MyGrad will maintain a wider window of support for minor Python versions than is specified by NEP 29. Because our only dependency is NumPy, and because we strive to remain an exceptionally lightweight and flexible dependency to our users, we will support minor versions of Python until their end of life, or until our lowest supported version of NumPy drops support for that version of Python – whichever occurs first.

Introducing MyGrad#

MyGrad is a lightweight library that adds automatic differentiation to NumPy – its only dependency is NumPy!

>>> import mygrad as mg
>>> import numpy as np

>>> x = mg.tensor([1., 2., 3.])  # like numpy.array, but supports backprop!
>>> f = np.sum(x * x)  # tensors work with numpy functions!
>>> f.backward() # triggers automatic differentiation
>>> x.grad  # stores [df/dx0, df/dx1, df/dx2]
array([2., 4., 6.])

Its primary goal is to make automatic differentiation an accessible and easy to use across the Python/NumPy ecosystem. As such, it strives to behave and feel exactly like NumPy so that users need not learn yet another array-based math library. You can pass MyGrad’s Tensor to NumPy’s functions in order to make them differentiable! Of the various modes and flavors of auto-diff, MyGrad supports backpropagation from a scalar quantity.

A Simple Application#

Let’s use mygrad to compute the derivative of \(f(x) = x^2\) evaluated at \(x = 3\) (which is \(\frac{df}{dx}\rvert_{x=3} = 2\times 3\)).

Tensor behaves nearly identically to NumPy’s ndarray, in addition to having the machinery needed to compute the analytic derivatives of functions. Suppose we want to compute this derivative at x = 3. We can create a 0-dimensional tensor (a scalar) for x and compute f(x):

>>> import mygrad as mg
>>> import numpy as np
>>> x = mg.tensor(3.0)
>>> f = np.square(x)  # mygrad's tensors can be passed into NumPy functions
>>> f
Tensor(9.0)

Invoking backward() on f instructs mygrad to trace through the computational graph that produced f and compute the derivatives of f with respect to all of its independent variables. Thus, executing f.backward() will compute \(\frac{df}{dx} = 2x\) at \(x=3\), and will store the resulting value in x.grad:

>>> f.backward()  # triggers computation of ``df/dx``
>>> x.grad  # df/dx = 2x = 6.0
array(6.0)

This is the absolute tip of the iceberg. mygrad can compute derivatives of multivariable composite functions of tensor-valued variables!

Gradient Descent with MyGrad#

Performing gradient descent on \(\mathscr{L}(w) = w ^ 2\)

w = mg.tensor(10.0)
learning_rate = 0.3
num_steps = 10
print(w)

for step_cnt in range(num_steps):
     = w ** 2    # compute L(w) (this also "nulls" any derivatives")
    .backward()  # compute derivative of L

    # Update w via gradient-step..
    # We do an augmented update on the underlying numpy-array
    # stored by `w`
    w.data -= learning_rate * w.grad
    print(w)

The following steps are printed out.. see that gradient descent leads us towards the minimum of \(w = 0\)

Tensor(10.)
Tensor(4.)
Tensor(1.6)
Tensor(0.64)
Tensor(0.256)
Tensor(0.1024)
Tensor(0.04096)
Tensor(0.016384)
Tensor(0.0065536)
Tensor(0.00262144)
Tensor(0.00104858)

Some Bells and Whistles#

mygrad supports all of NumPy’s essential features, including:

Tensor plays nicely with NumPy-arrays, which behave as constants when they are used in computational graphs:

>>> import numpy as np
>>> x = mg.tensor([2.0, 2.0, 2.0])
>>> y = np.array([1.0, 2.0, 3.0])
>>> f = x ** y  # (2 ** 1, 2 ** 2, 2 ** 3)
>>> f.backward()
>>> x.grad
array([ 1.,  4., 12.])

nnet supplies essential functions for machine learning, including:

It leverages a nice sliding window view function, which produces convolution-style windowed views of arrays/tensors without making copies of them, to intuitively (and quite efficiently) perform the neural network-style convolutions and pooling.

Advanced Example#

The following is an example of using mygrad to compute the hinge loss of classification scores and to “back-propagate” through (compute the gradient of) this loss. This example demonstrates some of mygrad’s ability to perform back-propagation through broadcasted operations, basic indexing, advanced indexing, and in-place assignments.

>>> from mygrad import Tensor
>>> import numpy as np
>>> class_scores = Tensor(10 * np.random.rand(100, 10))         # 100 samples, 10 possible classes for each
>>> class_labels = np.random.randint(low=0, high=10, size=100)  # correct label for each datum
>>> class_labels = (range(len(class_labels)), class_labels)
>>> correct_class_scores = class_scores[class_labels]

>>> Lij = class_scores - correct_class_scores[:, np.newaxis] + 1. 0 # 100x10 margins
>>> Lij[Lij <= 0] = 0      # scores within the hinge incur no loss
>>> Lij[class_labels] = 0  # the score corresponding to the correct label incurs no loss

>>> loss = Lij.sum() / class_scores.shape[0]  # compute mean hinge loss
>>> loss.backward()    # compute gradient of loss w.r.t all dependent tensors
>>> class_scores.grad  # d(loss)/d(class_scores)
array([[ 0.  ,  0.01,  0.  , -0.04,  0.  ,  0.  ,  0.01,  0.  ,  0.01, 0.01], ...])

Computational Graph Visualization#

MyGrad provides the capability to visually render diagrams of your computational graphs:

import mygrad as mg
from mygrad.computational_graph import build_graph
x = mg.tensor(2)
y = mg.tensor(3)
f = x * y
g = f + x - 2

build_graph(g, names=locals())
_images/example_graph.svg

mygrad uses Graphviz and a Python interface for Graphviz to render the computational graphs built using tensors. These graphs can be rendered in Jupyter notebooks, allowing for quick checks of graph structure, or can be saved to file for later reference.

The dependencies can be installed with:

conda install graphviz
conda install python-graphviz

Big thanks to Petar Griggs for implementing these fantastic viz capabilities!

MyGrad’s Tensor#

Tensor is the most critical piece of MyGrad. It is a numpy-array-like object capable of serving as a node in a computational graph that supports back-propagation of derivatives via the chain rule.

You can effectively do a drop-in replacement of a numpy array with a Tensor for all basic mathematical operations. This includes basic and advanced indexing, broadcasting, sums over axes, etc; it will simply just work.

>>> import mygrad as mg  # note that we replace numpy with mygrad here
>>> x = mg.arange(9).reshape(3, 3)
>>> x
Tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
>>> y = x[x == 4] ** 2
>>> y
Tensor([16], dtype=int32)

Thus MyGrad users can spend their time mastering numpy and their skills will transfer seamlessly when using this autograd library.

Creating a Tensor#

Tensor can be passed any “array-like” object of numerical data. This includes numbers, sequences (e.g. lists), nested sequences, numpy-ndarrays, and other mygrad-tensors. mygrad also provides familiar numpy-style tensor-creation functions (e.g. arange(), linspace(), etc.)

>>> import mygrad as mg
>>> mg.tensor(2.3)  # creating a 0-dimensional tensor
Tensor(2.3)
>>> mg.tensor(np.array([1.2, 3.0]))  # casting a numpy-array to a tensor
Tensor([1.2, 3.0])
>>> mg.tensor([[1, 2], [3, 4]])  # creating a 2-dimensional tensor from lists
Tensor([[1, 2],
       [3, 4]])
>>> mg.arange(4)    # using numpy-style tensor creation functions
Tensor([0, 1, 2, 3])

Integer-valued tensors are treated as constants

>>> mg.astensor(1, dtype=np.int8).constant
True

By default, float-valued tensors are not treated as constants

>>> mg.astensor(1, dtype=np.float32).constant
False

Forward and Back-Propagation#

Let’s construct a computational graph consisting of two zero-dimensional tensors, x and y, which are used to compute an output tensor, . This is a “forward pass imperative” style for creating a computational graph - the graph is constructed as we carry out the forward-pass computation.

>>> x = Tensor(3.0)
>>> y = Tensor(2.0)
>>>  = 2 * x + y ** 2

Invoking ℒ.backward() signals the computational graph to compute the total-derivative of with respect to each one of its dependent variables. I.e. x.grad will store dℒ/dx and y.grad will store dℒ/dy. Thus we have back-propagated a gradient from through our graph.

Each tensor of derivatives is computed elementwise. That is, if x = Tensor(x0, x1, x2), then dℒ/dx represents [dℒ/d(x0), dℒ/d(x1), dℒ/d(x2)]

>>> .backward()  # computes dℒ/dx and dℒ/dy
>>> x.grad  # dℒ/dx
array(6.0)
>>> y.grad  # dℒ/dy
array(4.0)
>>> .grad
array(1.0)  # dℒ/dℒ

Once the gradients are computed, the computational graph containing x, y, and is cleared automatically. Additionally, involving any of these tensors in a new computational graph will automatically null their gradients.

>>> 2 * x
>>> x.grad is None
True

Or, you can use the null_grad() method to manually clear a tensor’s gradient

>>> y.null_grad()
Tensor(2.)
>>> y.grad is None
True

Accessing the Underlying NumPy Array#

Tensor is a thin wrapper on numpy.ndarray. A tensor’s underlying numpy-array can be accessed via .data. This returns a direct reference to the numpy array.

>>> x = mg.tensor([1, 2])
>>> x.data
array([1, 2])
>>> import numpy as np
>>> np.asarray(x)
array([1, 2])

Producing a “View” of a Tensor#

MyGrad’s tensors exhibit the same view semantics and memory-sharing relationships as NumPy arrays. I.e. any (non-scalar) tensor produced via basic indexing will share memory with its parent.

>>> x = mg.tensor([1., 2., 3., 4.])
>>> y = x[:2]  # the view: Tensor([1., 2.])
>>> y.base is x
True
>>> np.shares_memory(x, y)
True

Mutating shared data will propagate through views:

>>> y *= -1
>>> x
Tensor([-1., -2.,  3.,  4.])
>>> y
Tensor([-1., -2.])

And this view relationship will also manifest between the tensors’ gradients

>>> (x ** 2).backward()
>>> x.grad
array([-2., -4.,  6.,  8.])
>>> y.grad
array([-2., -4.])

Documentation for mygrad.Tensor#

Tensor.astype(dtype[, casting, copy, constant])

Copy of the tensor with the specified dtype.

Tensor.backward([grad])

Trigger backpropagation and compute the derivatives of this tensor.

Tensor.base

A reference to the base tensor that the present tensor is a view of.

Tensor.clear_graph()

Removes the current tensor – and tensors above it – from their shared computational graph.

Tensor.constant

If True, this tensor is a constant; it will not propagate any gradient.

Tensor.copy(*[, constant])

Produces a copy of self with copy.creator=None.

Tensor.creator

The Operation instance that produced self.

Tensor.dtype

Data-type of the tensor's elements.

Tensor.grad

Returns the derivative of with respect to this tensor.

Tensor.item()

Copy an element of a tensor to a standard Python scalar and return it.

Tensor.ndim

Number of tensor dimensions.

Tensor.null_grad(*[, _clear_view_info])

Sets this tensor's gradient to be None.

Tensor.null_gradients([clear_graph])

**Deprecated: Tensors will automatically have their computational graphs cleared during backprop.

Tensor.shape

Tuple of tensor dimension-sizes.

Tensor.size

Number of elements in the tensor.

Tensor.T

Same as self.transpose(), except that self is returned if self.ndim < 2 and a view of the underlying data is utilized whenever possible.

Views and In-Place Operations#

Producing a “View” of a Tensor#

MyGrad’s tensors exhibit the same view semantics and memory-sharing relationships as NumPy arrays. I.e. any (non-scalar) tensor produced via basic indexing will share memory with its parent.

>>> x = mg.tensor([1., 2., 3., 4.])
>>> y = x[:2]  # the view: Tensor([1., 2.])
>>> y.base is x
True
>>> np.shares_memory(x, y)
True

Mutating shared data will propagate through views:

>>> y *= -1
>>> x
Tensor([-1., -2.,  3.,  4.])
>>> y
Tensor([-1., -2.])

And this view relationship will also manifest between the tensors’ gradients

>>> (x ** 2).backward()
>>> x.grad
array([-2., -4.,  6.,  8.])
>>> y.grad
array([-2., -4.])

In-Place Operations are not Efficient#

It is important to note that although MyGrad’s view semantics promote a rich parity with NumPy, certain aspects should be avoided in the interest of optimized performance. Namely, performing in-place operations on tensors is generally not more efficient than their non-mutating counterparts.

This is because MyGrad has to track the state of tensors that are involved in a computational graph. Thus a mutated tensor must have its pre-augmented state stored for future reference; this defeats the performance benefit of writing to an array’s memory in-place. This is especially inefficient if you are mutating a tensor involved with multiple views of the same memory( By contrast, producing a view of a tensor is efficient as one would expect).

Thus these NumPy-like in-place semantics are supported by MyGrad not for the same performance purposes, but instead to support convenient and familiar code-patterns and to enable one to port NumPy code to MyGrad (or, in the future, inject MyGrad tensors into NumPy!!) and get the exact same behavior.

A final note: MyGrad’s in-place operations, when run under no_autodiff() mode, do not incur the extra costs noted above, and thus your code will benefit from the performance benefits of in-place operations.

Performance Tips#

The following functions provide users with controls for optimizing MyGrad code by either suspending its memory-guarding behavior or by disabling automatic differentiation altogether. These are important utilities for speeding up your code.

Beyond the points made below, general performance tips for NumPy – e.g. leveraging vectorized operations, heeding NumPy’s row-major memory layout for arrays when constructing tensors, and using basic indexing to create views of arrays instead of copies – apply equally to MyGrad and its tensors. After all, MyGrad operates almost entirely in NumPy arrays and NumPy functions under the hood.

Suspending Graph-Tracking for Automatic Differentiation#

no_autodiff

Serves as a context manager and decorator for suspending all computational graph tracking.

In the case that you want to run a computation involving MyGrad tensors, but you don’t need to access their gradients (e.g. when measuring the “test-time” performance of a model that you are training), then you can use the provided decorator/context-manager for suspending all of MyGrad’s “graph-tracking” features.

>>> import mygrad as mg
>>> with mg.no_autodiff:
...     # any mygrad code in this context will run faster
...     # but will not produce any gradients

Note that this also suspends all memory-guarding (see below), since MyGrad doesn’t need to ensure the preservation of any state.

Suspending all graph-tracking features can speed up code involving many small tensors substantially - about a 3x speedup.

Controlling Memory-Guarding Behavior#

mem_guard_off

A context manager used to suspend memory-locking behavior

mem_guard_on

A context manager used to enable memory-locking behavior

turn_memory_guarding_off()

Globally disables all memory-guarding mechanisms, except for in contexts where they are explicitly enabled.

turn_memory_guarding_off()

Globally disables all memory-guarding mechanisms, except for in contexts where they are explicitly enabled.

By default, MyGrad tracks and locks the readability of all of the NumPy arrays that are involved in computational graphs involving tensors.

These stateful graphs are how MyGrad is able to perform backpropagation and compute the gradients of tensors involved in a given calculation. Because of the stateful nature of a computational graph, mutating a NumPy array inplace could corrupt the state of the computational graph - i.e. the derivatives computed would not accurately reflect the values that were used during the “forward pass”. Read the following code to see such a mutation rear its head.

>>> import mygrad as mg
>>> import numpy as np
>>> mg.turn_memory_guarding_off()  # speeds up calculations, but with risks involved..
>>> x = np.arange(3.)
>>> y = mg.ones_like(x)
>>> z = x * y
>>> x[:] = 0  # mutates x, corrupting state associated with z
>>> z.backward()
>>> y.grad  # would be array([0., 1., 2.]) if graph wasn't corrupted
array([0., 0., 0.])

Note that, were x an instance of Tensor, there would not be any issue with the above calculation, since MyGrad can track the in-place update on a tensor. MyGrad cannot, on the otherhand track such operations involving only NumPy arrays

Thus MyGrad prohibits such mutations with its aforementioned “memory guarding” behavior, however it is smart about restoring the writeability of all arrays once they are no longer participating in a computational graph (e.g. backpropagation has been performed through the graph).

>>> import mygrad as mg
>>> import numpy as np
>>> x = np.arange(3.)
>>> y = mg.ones_like(x)
>>> z = x * y
>>> try:
...     x[:] = 0  # raises because `x` is made read-only
... except ValueError:
...     pass
>>> z.backward()
>>> y.grad  # correct gradient is computed
array([0., 1., 2.])
>>> x[:] = 0  # the writeability of `x` is restored once backprop is complete

This memory-guarding behavior comes at a cost: for computations involving many small tensors (e.g. in an handmade RNN) this can lead to slowdowns of ~50%. Thus MyGrad provides various mechanisms for disabling all such memory-guards. Note, however, for computations involving large tensors (e.g. for typical dense and convolutional neural networks), the overhead associated with the memory-guarding feature is likely negligible compared to the core numerical computations at play.

If one wants to enjoy the optimizations associated with removing memory guarding, it is recommended that you first test your code with the default memory guarding enabled; once you have witnessed that MyGrad didn’t raise any errors, you can then proceed to run your code “at scale” with memory-guarding disabled.

Make Use of Views but Avoid Involving them in In-Place Operations#

Please refer to the section on views and in-place operations for more details. The upshot is: views of tensors are efficient to create, as they do not involve copying any memory, but performing an in-place operations on a tensor will copy that tensor. Furthermore, performing an in-place operation on a view will lead to the creation of a copy of its associated base tensor.

If you are relying on this mutation propagating to many various views, then this can still be a net-gain in performance compared to updating all of them “manually”. But, generally, in-place updates on tensors do not have the same performance benefits as do augmentations on NumPy arrays.

Writing Your Own Operations#

Let’s write our own “multiply” operation. There are two components to doing this:
  • Defining an operation class (a subclass of Operation)

  • Writing a function that ultimately calls mygrad.execute_op(YourOp, ...)

import numpy as np

import mygrad as mg
from mygrad import execute_op
from mygrad.operation_base import Operation
from mygrad.typing import ArrayLike

# All operations should inherit from Operation, or one of its subclasses
class CustomMultiply(Operation):
    """ Performs f(x, y) = x * y """

    def __call__(self, x: mg.Tensor, y: mg.Tensor) -> np.ndarray:
        # This method defines the "forward pass" of the operation.
        # It must bind the variable tensors to the op and compute
        # the output of the operation as a numpy array

        # All tensors must be bound as a tuple to the `variables`
        # instance variable.
        self.variables = (x, y)

        # The forward pass should be performed using numpy arrays,
        # not the tensors themselves.
        x_arr = x.data
        y_arr = y.data
        return x_arr * y_arr

    def backward_var(self, grad, index, **kwargs):
        """Given ``grad = dℒ/df``, computes ``∂ℒ/∂x`` and ``∂ℒ/∂y``

        ``ℒ`` is assumed to be the terminal node from which ``ℒ.backward()`` was
        called.

        Parameters
        ----------
        grad : numpy.ndarray
            The back-propagated total derivative with respect to the present
            operation: dℒ/df. This will have the same shape as f, the result
            of the forward pass.

        index : Literal[0, 1]
            The index-location of ``var`` in ``self.variables``

        Returns
        -------
        numpy.ndarray
            ∂ℒ/∂x_{i}

        Raises
        ------
        SkipGradient"""
        x, y = self.variables
        x_arr = x.data
        y_arr = y.data

        # The operation need not incorporate specialized logic for
        # broadcasting. The appropriate sum-reductions will be performed
        # by MyGrad's autodiff system.
        if index == 0:  # backprop through a
            return grad * y.data  # ∂ℒ/∂x = (∂ℒ/∂f)(∂f/∂x)
        elif index == 1:  # backprop through b
            return grad * x.data  # ∂ℒ/∂y = (∂ℒ/∂f)(∂f/∂y)


# Our function stitches together our operation class with the
# operation arguments via `mygrad.prepare_op`
def custom_multiply(x: ArrayLike, y: ArrayLike, constant=None) -> mg.Tensor:
    # `execute_op` will take care of:
    #  - casting `x` and `y` to tensors if they are instead array-likes
    #  - propagating 'constant' status to the resulting output based on the inputs
    #  - handling in-place operations (specified via the `out` parameter)
    return execute_op(CustomMultiply, x, y, constant=constant)

We can now use our differentiable function!

>>> x = mg.tensor(2.0)
>>> y = mg.tensor([1.0, 2.0, 3.0])

>>> custom_multiply(x, y).backward()
>>> x.grad, y.grad
(array(6.), array([2., 2., 2.]))

Documentation for mygrad.Operation#

Operation()

Base class for all tensor operations that support back-propagation of gradients.

Operation.backward(grad, **kwargs)

Back-propagates the gradient through all of the operation's inputs, which are stored in the tuple self.variables.

Operation.backward_var(grad, index, **kwargs)

Given grad = dℒ/df, computes ∂ℒ/∂x_{i}, where x_{i} is one of x1, ...., xn.

Tensor creation routines (mygrad.tensor_creation)#

Array-Like#

tensor(arr_like[, dtype, constant, copy, ndmin])

Create a tensor

asarray(a[, dtype, order])

Convert the input to an array.

astensor(t[, dtype, constant])

Convert the input to a tensor.

Ones and zeros#

ones(shape[, dtype, constant])

Return a Tensor of the given shape and type, filled with ones.

ones_like(other[, dtype, shape, constant])

Return a Tensor of the same shape and type as the given, filled with ones.

zeros(shape[, dtype, constant])

Return a Tensor of the given shape and type, filled with zeros.

zeros_like(other[, dtype, shape, constant])

Return a Tensor of the same shape and type as the given, filled with zeros.

eye(N[, M, k, dtype, constant])

Return a 2D Tensor with ones on the diagonal and zeros elsewhere.

identity(n[, dtype, constant])

Return the identity Tensor; a square Tensor with 1s on the main diagonal and 0s elsewhere.

full(shape, fill_value[, dtype, constant])

Return a Tensor of the given shape and type, filled with fill_value.

full_like(other, fill_value[, dtype, shape, ...])

Return a Tensor of the same shape and type as the given, filled with fill_value.

empty(shape[, dtype, constant])

Return a new Tensor of the given shape and type, without initializing entries.

empty_like(other[, dtype, shape, constant])

Return a new Tensor of the same shape and type as the given array.

Numerical ranges#

arange([start,] stop[, step,][, dtype, constant])

Return a Tensor with evenly-spaced values within a given interval.

linspace(start, stop[, num, endpoint, ...])

Return a Tensor with evenly-spaced numbers over a specified interval.

logspace(start, stop[, num, endpoint, base, ...])

Return a Tensor with evenly-spaced numbers over a specified interval on a log scale.

geomspace(start, stop[, num, endpoint, ...])

Return a Tensor with evenly-spaced values in a geometric progression.

Tensor manipulation routines (mygrad.tensor_manip)#

Changing array shape#

ravel(a, *[, constant])

Flattens contents of a tensor into a contiguous 1-D array.

reshape(a, newshape, *[, constant])

Returns a tensor with a new shape, without changing its data.

Tensor.flatten(*[, constant])

Return a copy of the tensor collapsed into one dimension.

Transpose-like operations#

moveaxis(a, source, destination, *[, constant])

Move axes of a tensor to new positions.

roll(a, shift[, axis, constant])

Roll tensor elements along a given axis.

swapaxes(a, axis1, axis2, *[, constant])

Interchange two axes of a tensor.

Tensor.T

Same as self.transpose(), except that self is returned if self.ndim < 2 and a view of the underlying data is utilized whenever possible.

transpose(a, *axes[, constant])

Permute the dimensions of a tensor.

Changing number of dimensions#

atleast_1d()

Convert inputs to tensors with at least one dimension.

atleast_2d()

Convert inputs to tensors with at least one dimension.

atleast_3d()

Convert inputs to tensors with at least one dimension.

broadcast_to(a, shape, *[, constant])

Broadcast a tensor to a new shape.

expand_dims(a, axis, *[, constant])

Expand the dimensions of a tensor by adding a new axis.

squeeze(a[, axis, constant])

Remove single-dimensional entries from the shape of a tensor.

Joining tensors#

concatenate([axis, out, constant])

Join a sequence of tensors along an existing axis.

stack([axis, out, constant])

Join a sequence of tensors along a new axis.

Tiling tensors#

repeat(a, repeats[, axis, constant])

Repeat elements of a tensor.

Linear algebra (mygrad.linalg)#

Matrix and vector products#

matmul(x1, x2[, out, dtype, constant])

Matrix product of two tensors:

multi_matmul(tensors, *[, constant])

Matrix product of two or more tensors calculated in the optimal ordering

einsum(subscripts, *operands)

Evaluates the Einstein summation convention on the operands.

Norms and other numbers#

linalg.norm(x[, ord, axis, keepdims, ...])

Vector norm.

Mathematical functions (mygrad.math)#

Trigonometric functions#

sin(x[, out, where, dtype, constant])

Trigonometric sine, element-wise.

cos(x[, out, where, dtype, constant])

Trigonometric cosine, element-wise.

tan(x[, out, where, dtype, constant])

Trigonometric tangent, element-wise.

arcsin(x[, out, where, dtype, constant])

Inverse sine, element-wise.

arccos(x[, out, where, dtype, constant])

Inverse cosine, element-wise.

arctan(x[, out, where, dtype, constant])

Inverse tangent, element-wise.

arctan2(x1, x2[, out, where, dtype, constant])

Element-wise arc tangent of x1/x2 choosing the quadrant correctly.

Hyperbolic functions#

sinh(x[, out, where, dtype, constant])

Hyperbolic sine, element-wise.

cosh(x[, out, where, dtype, constant])

Hyperbolic cosine, element-wise.

tanh(x[, out, where, dtype, constant])

Hyperbolic tangent, element-wise.

arcsinh(x[, out, where, dtype, constant])

Inverse hyperbolic sine, element-wise.

arccosh(x[, out, where, dtype, constant])

Inverse hyperbolic cosine, element-wise.

arctanh(x[, out, where, dtype, constant])

Inverse hyperbolic tangent, element-wise.

Sums, products, differences#

prod(a[, axis, keepdims, constant])

Return the product of array elements over given axes.

sum(x[, axis, keepdims, constant])

Sum of tensor elements over a given axis.

cumprod(a[, axis, constant])

Return the cumulative product of elements along a given axis.

cumsum(a[, axis, constant])

Return the cumulative sum of the elements along a given axis.

mean(x[, axis, keepdims, constant])

Mean of tensor elements over a given axis.

var(x[, axis, ddof, keepdims, constant])

Compute the variance along the specified axis.

std(x[, axis, ddof, keepdims, constant])

Compute the standard deviation along the specified axis.

amax(x[, axis, keepdims, constant])

Return the maximum of a tensor or maximum along its axes.

amin(x[, axis, keepdims, constant])

Return the minimum of a tensor or minimum along its axes.

max(x[, axis, keepdims, constant])

Return the maximum of a tensor or maximum along its axes.

min(x[, axis, keepdims, constant])

Return the minimum of a tensor or minimum along its axes.

Exponents and logarithms#

exp(x1[, out, where, dtype, constant])

Calculate the exponential of all elements in the input tensor.

expm1(x1[, out, where, dtype, constant])

Calculate exp(x) - 1 for all elements in the tensor.

exp2(x1[, out, where, dtype, constant])

Calculate 2**p for all p in the input tensor.

log(x1[, out, where, dtype, constant])

Natural logarithm, element-wise.

log10(x1[, out, where, dtype, constant])

Return the base 10 logarithm of the input tensor, element-wise.

log2(x1[, out, where, dtype, constant])

Base-2 logarithm applied elementwise to the tensor.

log1p(x1[, out, where, dtype, constant])

Return the natural logarithm of one plus the input tensor, element-wise.

logaddexp(x1, x2[, out, where, dtype, constant])

Logarithm of the sum of exponentiations of the inputs.

logaddexp2(x1, x2[, out, where, dtype, constant])

Logarithm of the sum of exponentiations of the inputs in base-2.

Other special functions#

add_sequence(*variables[, constant])

f(a, b, ...) -> a + b + ...

multiply_sequence(*variables[, constant])

f(a, b, ...) -> a * b * ...

sinc(a, *[, constant])

f(a) -> sin(a) / a

Arithmetic operations#

add(x1, x2[, out, where, dtype, constant])

Add the arguments element-wise.

reciprocal(x[, out, where, dtype, constant])

Return the reciprocal of the argument element-wise.

positive(x[, out, where, dtype, constant])

Returns a copy of the tensor.

negative(x[, out, where, dtype, constant])

Negates the tensor element-wise.

multiply(x1, x2[, out, where, dtype, constant])

Multiply the arguments element-wise.

divide

alias of true_divide

power(x1, x2[, out, where, dtype, constant])

First tensor elements raised to powers from second tensor, element-wise.

subtract(x1, x2[, out, where, dtype, constant])

Subtract the arguments element-wise.

Miscellaneous#

clip(a, a_min, a_max[, out, constant])

Clip (limit) the values in an array.

sqrt(x[, out, where, dtype, constant])

The square root, elementwise.

cbrt(x[, out, where, dtype, constant])

The cube root elementwise.

square(x[, out, where, dtype, constant])

Return the square of the argument element-wise.

absolute(x[, out, where, dtype, constant, ...])

The absolute value, computed elementwise.

maximum(x1, x2[, out, where, dtype, constant])

Pair-wise maximum of tensor elements.

minimum(x1, x2[, out, where, dtype, constant])

Pair-wise minimum of tensor elements.

Indexing Routines (mygrad.indexing_routines)#

Generating index tensors#

where(condition, [x, y])

Return elements chosen from x or y depending on condition.

Neural network operations (mygrad.nnet)#

Layer operations#

batchnorm(x, *[, gamma, beta, constant])

Performs batch normalization on x.

conv_nd(x, filter_bank, *, stride[, ...])

Use filter_bank (w) to perform strided N-dimensional neural network-style convolutions (see Notes) over x..

max_pool(x, pool, stride, *[, constant])

Perform max-pooling over the last N dimensions of a data batch.

gru(X, Uz, Wz, bz, Ur, Wr, br, Uh, Wh, bh[, ...])

Performs a forward pass of sequential data through a Gated Recurrent Unit layer, returning the 'hidden-descriptors' arrived at by utilizing the trainable parameters as follows.

Losses#

focal_loss(class_probs, targets, *[, alpha, ...])

Return the per-datum focal loss.

margin_ranking_loss(x1, x2, y, margin, *[, ...])

Computes the margin average margin ranking loss. Equivalent to::.

multiclass_hinge(x, y_true[, hinge, constant])

Computes the average multiclass hinge loss.

negative_log_likelihood(x, y_true, *[, ...])

Returns the (weighted) negative log-likelihood loss between log-probabilities and y_true.

softmax_crossentropy(x, y_true, *[, constant])

Given the classification scores of C classes for N pieces of data,

softmax_focal_loss(scores, targets, *[, ...])

Applies the softmax normalization to the input scores before computing the per-datum focal loss.

Activations#

elu(x, alpha, *[, constant])

Returns the exponential linear activation (ELU) elementwise along x.

glu(x[, axis, constant])

Returns the Gated Linear Unit A * σ(B), where A and B are split from x.

hard_tanh(x, *[, lower_bound, upper_bound, ...])

Returns the hard hyperbolic tangent function.

leaky_relu(x, slope, *[, constant])

Returns the leaky rectified linear activation elementwise along x.

logsoftmax(x[, axis, constant])

Applies the log-softmax activation function.

selu(x, *[, constant])

Returns the scaled exponential linear activation (SELU) elementwise along x.

sigmoid(x, *[, constant])

Applies the sigmoid activation function.

softmax(x[, axis, constant])

Applies the softmax activation function.

soft_sign(x, *[, constant])

Returns the soft sign function x / (1 + |x|).

relu(x, *[, constant])

Applies the recitfied linear unit activation function.

tanh(x[, out, where, dtype, constant])

Hyperbolic tangent, element-wise.

Initializers#

dirac(*shape[, dtype, constant])

Initialize a mygrad.Tensor according to the Dirac initialization procedure described by Zagoruyko and Komodakis.

glorot_normal(*shape[, gain, dtype, constant])

Initialize a mygrad.Tensor according to the normal initialization procedure described by Glorot and Bengio.

glorot_uniform(*shape[, gain, dtype, constant])

Initialize a mygrad.Tensor according to the uniform initialization procedure described by Glorot and Bengio.

he_normal(*shape[, gain, dtype, constant])

Initialize a mygrad.Tensor according to the normal initialization procedure described by He et al.

he_uniform(*shape[, gain, dtype, constant])

Initialize a mygrad.Tensor according to the uniform initialization procedure described by He et al.

normal(*shape[, mean, std, dtype, constant])

Initialize a mygrad.Tensor by drawing from a normal (Gaussian) distribution.

uniform(*shape[, lower_bound, upper_bound, ...])

Initialize a mygrad.Tensor by drawing from a uniform distribution.

Sliding Window View Utility#

sliding_window_view(arr, window_shape, step)

Create a sliding window view over the trailing dimensions of an array.

Input and Output#

NumPy binary files (NPY, NPZ)#

save(file, tensor)

Saves a tensor and its gradient information.

load(file)

Loads a saved Tensor and its gradient information (if applicable).

Computational graph visualization(mygrad.computational_graph)#

build_graph(fin[, names, render, save, ...])

Builds and renders a computational graph.

Changelog#

This is a record of all past mygrad releases and what went into them, in reverse chronological order. All previous releases should still be available on pip.

2.2.0 - 2023-01-03#

2.1.0 - 2022-01-01#

New Functions and Utilities#

The following differentiable functions are now supported by MyGrad, and “drop-in” overrides for their NumPy counterparts are supported as well.

Basic tensor save/load functionality has been added (thanks to @kw-0).

Improvements#
  • clip() and Tensor.clip now accept an out target, permitting in-place operations.

  • The method Tensor.__index__() is now implemented, which permits scalar integer-valued tensors to be used to index into Python sequences.

  • Added Python 3.10 to our automated test matrix.

Compatibility-Breaking Changes#
  • In accordance with NEP 29 we are dropping support for NumPy versions below 1.19. However, MyGrad will not drop support for Python 3.7; to remain as lightweight and flexible as possible we will support minor versions of Python up until their EOL or until our minimal NumPy dependency drops support – whichever occurs first.

  • The interface to arange() was changed from arange(start, stop=None, step=None, ...) to arange([start,] stop[, step,], ...). This provides exact parity with NumPy’s arange function.

  • The derivatives of absolute() and norm() have been revised such that in cases where the derivatives used to be nan, those entries will now be 0. Both functions can now be passed nan_to_num=False to enable the previous, more rigorous behavior. See PR #379 for more details.

2.0.2 - 2021-04-10#

Exposes execute_op() at top-level namespace

2.0.1 - 2021-04-03#

Bug Fixes#
  • matmul() and multi_matmul() were missing from the top-level namespace of mygrad.

  • A 0D tensor involved in a broadcasted operation would have a numpy-float set for its gradient instead of a 0D array.

New Functions#

The following non-differentiable NumPy functions now work on mygrad tensors (and return ndarrays). Aliases of these are available at the top-level namespace of mygrad

  • np.isnan

  • np.isfinite

  • np.isinf

  • np.isnat

  • np.signbit

  • np.logical_not

  • np.logical_and

  • np.logical_or

  • np.logical_xor

  • np.greater

  • np.greater_equal

  • np.less

  • np.less_equal

  • np.equal

  • np.not_equal

  • np.floor_divide

  • np.remainder

  • np.mod

  • np.fmod

  • np.divmod

  • np.rint

  • np.sign

  • np.floor

  • np.ceil

  • np.trunc

  • np.isclose

2.0.0 - 2021-03-30#

🎉🎉🎉

This is a compatibility-breaking update to MyGrad, and it’s great! MyGrad 2.0 represents a major overhaul to this project. This release creates near parity between the experiences of using MyGrad and using NumPy, and uses NumPy’s new mechanisms for overriding functions so that NumPy functions can operate “directly” on MyGrad’s tensors, and thus can be used to construct differentiable computational graphs!

>>> import numpy as np
>>> from mygrad import tensor
>>> x = tensor([1., 2.])
>>> np.square(x).backward()  # backprop through NumPy functions!
>>> x.grad
array([2., 4.])

Another important, but less exciting, feature is that MyGrad now protects users from inadvertently corrupting the state of a computational graph by, say, mutating a NumPy array that is participating in the graph. This is very useful for protecting people – especially students – from unwittingly poisoning the results of their calculations.

Lastly… no more “nulling” gradients! MyGrad will now handle deleting gradients for you in a way that is nicely compatible with gradient-based optimization work flows.

New Functions and Utilities#
Dropping Support for Python 3.6 and Numpy < 1.17#

MyGrad now abides by the NEP 29 recommendation, and adopts a common “time window-based” policy for support of Python and NumPy versions.

As such the Python 3.7 and Numpy 1.17 are the minimum versions supported by MyGrad 2.0.

The Interfaces Between mygrad.Tensor and numpy.array Match#

You can now control the dimensionality of a tensor and whether or not a tensor copies its data upon initialization, via the tensor() interface. This mirrors the behavior of array()

Numpy

MyGrad 1.X

MyGrad 2.0

>>> np.array([1., 2.], copy=True, ndmin=2)
array([[1., 2.]])
>>> mg.Tensor([1., 2.], copy=True, ndmin=2)
<TypeError>
>>> mg.tensor([1., 2.], copy=True, ndmin=2)
Tensor([[1., 2.]])
Support for dtype, where, and out in ufuncs#

MyGrad now implements ufuncs with support for specifying dtype, boolean masks, and in-place targets. The additional methods, such as mygrad.add.reduce, are not yet implemented.

MyGrad 2.0

>>> mg.add([1, 2],[0, 2], where=[True, False], dtype=float)
Tensor([3., 1.])
Augmented Updates on Tensors Now Match NumPy’s Behavior#

Previously, augmented assignment expressions, such as tensor *= 2, behaved merely as a shorthand for the simple assignment tensor = tensor * 2. This is in stark contrast to the behavior of an augmented assignment on a NumPy array, which mutates the array in-place.

This meant that there was a major discrepancy between how these expressions behaved across MyGrad and NumPy. This has changed in MyGrad 2.0: all augmented assignment expressions operate in-place on tensors and mutate their underlying data.

Numpy

MyGrad 1.X

MyGrad 2.0

>>> x = np.array([1., 2.])
>>> y = x
>>> x *= 2
>>> x is y
True
>>> x = mg.Tensor([1., 2.])
>>> y = x
>>> x *= 2  # x = 2 * x
>>> x is y  # doesn't match!
False
>>> x = mg.tensor([1., 2.])
>>> y = x
>>> x *= 2
>>> x is y  # matches!
True
Creating and Augmenting Views of Tensors#

MyGrad now provides rich support for creating and manipulating views of tensors.

All basic indexing operations performed on a tensor will produce a view of said tensor. This means that these two tensors share memory (While MyGrad 1.X created a view of the underlying NumPy array under the hood for basic indexing, its notion of supporting views went no further than that.) As with NumPy arrays the “parent” of a view can be accessed through the tensor’s .base attribute

Numpy

MyGrad 1.X

MyGrad 2.0

>>> x = np.array([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x
True
>>> x = mg.Tensor([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x  # doesn't match!
<AttributeError>
>>> x = mg.tensor([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x  # matches!
True

Mutating shared data will propagate through views:

Numpy

MyGrad 1.X

MyGrad 2.0

>>> y *= -1
>>> y
array([-1., -2.])
>>> x
array([-1., -2., 3.])
>>> y *= -1
>>> y
Tensor([-1., -2.])
>>> x  # doesn't match!
Tensor([1., 2., 3.])
>>> y *= -1
>>> y
Tensor([-1., -2.])
>>> x  # matches!
Tensor([-1., -2., 3.])

Furthermore, views of tensors now propagate corresponding gradient information as well! This means that if y is a view of x, then y.grad will be a corresponding view of x.grad. This is true for all varieties of views, views of views, etc., of x.

# Because `y` is a view of `x`, `y.grad` will be
# a corresponding view of `x.grad`
>>> (x ** 2).backward()
>>> x.grad
array([-2., -4.,  6.,  8.])
>>> y.grad
array([-2., -4.])
>>> y.grad.base is x.grad
True

This rich support for views, augmented assignments, and in-place updates on tensors enables much more sophisticated operations on tensors now. For example, let’s make a shape-(3, 3) tensor and perform and operations involving views of its diagonal and its anti-diagonal. (Note that einsum() is capable of returning a view of a tensor’s diagonal, and that MyGrad fully supports backpropagation through all flavors of einsum!)

>>> x = mg.tensor([[0., 1., 2.],
...                [3., 4., 5.],
...                [6., 7., 8.]])

# view of diagonal of `x`
>>> diag = mg.einsum("ii->i", x)
>>> diag
Tensor([0., 4., 8.])

# view of anti-diagonal of `x`
>>> anti_diag = mg.einsum("ii->i", x[:, ::-1])
>>> anti_diag
Tensor([2., 4., 6.])

# Compute derivatives of their summed difference
>>> (diag - anti_diag).sum().backward()
>>> x.grad
array([[ 1.,  0., -1.],
       [ 0.,  0.,  0.],
       [-1.,  0.,  1.]])

# The views of `x` have the appropriate corresponding
# views of `x.grad`
>>> diag.grad
array([1., 0., 1.])
>>> anti_diag.grad
array([-1.,  0., -1.])
Bye-Bye Null Gradients!#

Gone are the days of having to manually clear your tensors’ gradients and the computational graph that they were in; now MyGrad does it for you! This means that Tensor.null_gradients() no longer does anything other than emit a deprecation warning. In an upcoming minor release this method will be removed entirely.

In MyGrad 2.0, calling backward() will finish its computation by clearing the computational graph that was involved in the backpropagation. Thus any internally-referenced tensors associated with that computational graph become free for garbage collection. This is very nice behavior to help prevent students from filling up their RAM unwittingly.

And instead of worrying about nulling gradients manually, a tensor will automatically have its gradient cleared any time that it is involved in a new mathematical operation. This enables the following common workflow for performing gradient-based optimization:

MyGrad 1.X

MyGrad 2.0

>>> x = mg.Tensor([1., 2.])
>>> for _ in range(10):
...     y = 3 * x
...     assert x.grad is None
...     y.backward()
...     assert all(x.grad == 3.)
...     y.null_gradients()
>>> x = mg.tensor([1., 2.])
>>> for _ in range(10):
...     y = 3 * x  # nulls grad
...     assert x.grad is None
...     y.backward()
...     assert all(x.grad == 3.)
for _ in range(num_optimization_steps):
    # using `model_params` in a function will automatically
    # set its gradients to `None`
    loss = compute_loss(data, model_params)  # gradients cleared
    loss.backward()         # compute gradients
    optimize(model_params)  # do stuff with gradients

You can also call null_grad() to manually clear an individual tensor’s gradient.

Safety First: Memory Guarding Behavior in MyGrad 2.0#

In MyGrad 1.X it was all too easy to unwittingly corrupt the state of a computational graph by mutating a NumPy array mid-computation. This could lead to incorrect calculations of gradients! This is the stuff of horrifying nightmares.

Now MyGrad tracks all of the arrays that are involved in active computational graphs and locks their memory so that they are read-only (except for when the user mutates the array explicitly with a MyGrad operation). This means that the sort of mutation that could have lurked silently in the dimly-lit alleyways of bugs-ville will now get loudly narc’d on by MyGrad’s merciless memory guard!

MyGrad 1.X

MyGrad 2.0

>>> arr = np.array([1., 2.])
>>> tn = mg.Tensor([1. 1.])
>>> z = x * y
# mutating x will corrupt
# backprop through z...
>>> x[:] = 0.

>>> z.backward() # uh oh...
>>> tn.grad # should be: (1., 2.)
array([0., 0.])
>>> arr = np.array([1., 2.])
>>> tn = mg.tensor([1. 1.])
>>> z = x * y
# mutating x will corrupt
# backprop through z...
>>> x[:] = 0. # you shall not pass!
ValueError: read-only!
>>> z.backward()
>>> tn.grad
array([1., 2.])

Any tensor or array that is no longer participating in an active computational graph will automatically have its write-ability restored to its original state.

# memory guarding is released once an array is no
# longer involved in an active computational graph
>>> import mygrad as mg
>>> import numpy as np
>>> x = np.array([1., 2.])
>>> y = mg.ones_like(x)
>>> z = x * y     # x and y are locked
>>> z.backward()  # graph cleared; x and y are "released"
>>> x[:] = 0      # can write to x
>>> x
array([0., 0.])

# This result is not referenced, thus
# x and y are immediately released by the
# memory-guard; no graph-clearing is needed
>>> x * y
Tensor([0., 0.])
>>> x[:] = 1.

But with great responsibility comes great …uhh… slowness? This memory-guarding feature can lead to slowdowns of up to 50% for computations involving many small tensors (It used to be a lot worse… like 5x worse. I worked really hard to speed it up! I promise!). That being said, computations involving beefy tensors (e.g. standard neural networks) will not be significantly affected by the overhead associated with the memory guard. Please refer to Performance Tips for responsible ways to disable this memory-guarding mechanism.

Speaking of optimizations…

Disabling Automatic Differentiation#

Sometimes you want to use your MyGrad code to do calculations, but you don’t actually need to compute any derivatives. A common example of this is evaluating the test-time performance of a machine learning model that you are in the process of optimizing – you don’t actually need to perform backpropagation when you are processing the test data.

In these circumstances, you can greatly reduce the overhead cost associated with building a computational graph by using the no_autodiff() decorator / context manager. See the linked documentation for extensive examples of its usage.

# demonstrating mygrad in no-autodiff mode
>>> import mygrad as mg
>>> x = mg.Tensor([1., 2., 3., 4.])
>>> with mg.no_autodiff:
...     y = x ** 2  # operation not tracked
>>> y.backward()
>>> y.grad, x.grad  # x is not "connected" to y
(array([1., 1., 1.]), None)

For computations involving many small tensors, this can produce up to a 3x speedup! So make sure you make keen use of this when you don’t actually need to perform autodiff.

Revamping Constant Semantics to be Explicit#

Previously, specifying constant=False in a mygrad function did not actually mean that the function would necessarily produce a non-constant tensor. Rather, it simply meant that the output would not be _forced_ to be a constant – whether or not the result was a constant depended on the inputs (i.e. a function whose inputs were all constants would thus produce a constant).

This was a very bad design decision! Now, specifying constant=False guarantees that the output of a function is a non-constant (meaning that it facilitates backpropagation through a computational graph).

That being said, we usually _do_ want constant information to propagate through functions. Thus constant=None is now the default value – its behavior matches that of constant=False from MyGrad 1.X – for all functions that accept the argument.

It is also now standard to require that this argument be a keyword-only argument.

MyGrad 1.X

MyGrad 2.0

>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)

>>> out = mg.add(t1, t2, constant=False)
>>> out.constant
True
>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)

>>> out = mg.add(t1, t2, constant=False)
>>> out.constant
False

# constant = None
>>> out = mg.add(t1, t2)
>>> out.constant
True
>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)

# old behavior >>> out = mg.add(t1, t2, constant=False) >>> out.constant True

# new behavior >>> out = mg.add(t1, t2, constant=False) >>> out.constant False

>>> out = mg.add(t1, t2, constant=None)
>>> out.constant
True
Remove Scalar-Only Conditions on Backpropagation#

Previously, one could only invoke backpropagation from a non-scalar tensor only if that tensor was the culmination of operations that preserved a one-to-one mapping between the elements of an upstream tensor with its downstream neighbor. Otherwise an error was raised. This ensured that tensor.grad would always be the same shape as tensor, and not represent a higher-dimensional tensor.

Now calling tensor.backward() from a non-scalar tensor will behave as if the tensor was summed prior to invoking backpropagation. This is simple, easy-to-understand behavior, which ensures that tensor.grad can always be interpreted as an array of scalar-valued derivatives.

MyGrad 1.X

MyGrad 2.0

>>> t1 = mg.Tensor([[1., 2.],
...                 [0., -1]])
>>> t2 = mg.Tensor([[0., 1.],
...                 [3., -1]])
>>> z = t1 @ t2
>>> z.backward()
<InvalidBackprop: Scalar-only>
>>> t1 = mg.tensor([[1., 2.],
...                 [0., -1]])
>>> t2 = mg.tensor([[0., 1.],
...                 [3., -1]])
>>> z = t1 @ t2
>>> z.backward()
>>> t1.grad
array([[1., 2.],
       [1., 2.]])
Integer-valued Tensors Are Treated as Constants#

Derivatives involving integer-valued tensors are typically ill-defined, and in MyGrad 1.X they were generally just wrong. Now integer-valued tensors can only be involved in computational graphs as constants.

MyGrad 1.X

MyGrad 2.0

>>> t1 = mg.Tensor([[1, 2]).constant
False
>>> t1 = mg.tensor([[1, 2]]).constant
True
Is This Code Well-Tested?#

Yes! I consider MyGrad’s test suite to be the most important part of the library. It is the only reason why I feel comfortable releasing this code for students, teachers, and others to use. I leverage thorough property-based testing using the Hypothesis library to exercise this code as rigorously as I can manage. These tests even found bugs in NumPy!

Special Thanks#

Special thanks to Alex Silverstein, Zac Dodds, and Petar Griggs for all of the fruitful discussions, ideas, and influence that you provided throughout this major update.

1.9.0 - 2020-08-28#

The most significant aspect of this release is the implementation of Tensor.__array__, which enables a huge amount of cross-compatibility with numpy utilities (#288). Note that any previous reliance of a numpy function to produce an array of tensor-scalars will likely produce a standard numpy array instead.

Improvements:

  • x**1 and x**2 are now special-cased in order to make these common operations more efficient (#266)

  • The derivative of focal_loss() was refactored to handle special edge-cases and the tests for focal loss were improved to exercise these edge cases (#269)

  • Various improvements to the tests (#271, #277, #290, #284, #289, #282, #292, #293)

  • The internal mechanism for tracking tensors in computational graph now depends on hashing tensor-IDs instead of hashing tensors directly. The fact that tensors could be hashed was due to the fact that its equality specialty methods were being monkey-patched (#276)

  • softmax() and logsoftmax() both expose axis arguments (#268)

Bug fixes:

New features:

  • Tensor.__array__ now exposes the tensor’s underlying numpy array – this enables a huge amount of cross-compatibility with numpy utilities (#288)

  • Adds asarray() (#279)

  • Adds astensor() (#294)

1.8.1 - 2020-07-28#

This is an internal change to the backprop mechanism for Tensor.__getitem__, which produces considerable speedups (2x-4x) for backprop through basic indexing and boolean indexing. Thanks to Petar Griggs for finding this.

1.8.0 - 2020-07-25#

New features:

Thanks to Darshan Krishnaswamy and Sam Carpenter for adding this functionality!

Fixes a bug in the GRU layer where mixed floating point precision dtypes between data and weights raised an error. Thanks to Petar Griggs for the fix!

1.7.1 - 2020-07-11#

Fixes a bug in negative_log_likelihood(), where setting constant=True had no effect.

1.7.0 - 2020-07-11#

This release continues the process of integrating functions from mynn.

New features:

Big thanks to David Mascharka!

Improvements:

The interfaces to reshape() and reshape() were adjusted to match exactly the interfaces to their NumPy counterparts. I.e. reshape() now requires newshape to be a sequence, whereas reshape() can accept an unpacked sequence for its newshape.

shape() is now settable - triggering an in-place reshape of a tensor, matching the corresponding behavior in NumPy.

Internal changes:

The logic for writing an in-place operation has been consolidated into a convenient wrapper: _in_place_op().

1.6.0 - 2020-06-21#

New features:

Big thanks to David Mascharka!

1.5.0 - 2020-02-16#

New features:

This update also includes various improvements to the library’s test suite.

1.4.1 - 2020-01-09#

This release performs an internal refactor in the nnet module of the library, as well as an analogous refactor in the test suite. This also fixes a docstring in the multiclass_hinge loss to properly show a description in the readthedocs page.

1.4.0 - 2019-12-19#

This release adds the repeat() operation. It also includes some minor improvements to mygrad’s test suite.

1.3.0 - 2019-11-30#

This release adds clip() and where().

It also includes a major fix to the graph-traversal mechanism for null-gradients and clear-graph, eliminating an exponentially-scaling runtime.

+x will now invoke mygrad.positive, mirroring the numpy behavior

There are improvements to user-facing error messages and input validation in addition to major improvements to mygrad’s test suite. There is now a 100% line-coverage gate in mygrad’s CI system.

1.2.0 - 2019-08-03#

We’re finally keeping a formal changelog!

This release makes substantial improvements to MyGrad’s error-checking and handling, in order to make much simpler the process of debugging issues with buggy custom operations. Specifically, backward() now checks for an invalid-gradients on each call of backward_var(), and raises a descriptive error message.

mygrad.errors was introduced to provide descriptive, MyGrad-specific exceptions. For example, we no longer raise bare exceptions for scenarios like invalid backprop through a scalar-only graph; rather, we now raise a descriptive InvalidBackprop exception.

MyGrad’s testing framework received wide-ranging improvements, yielding complete test coverage and fewer flaky tests. Coverage checks were added to the project’s CI process.

maximum() and minimum() were patched to permit backpropagation through scalar inputs.

Internal implementation details of einsum() were adjusted to remove redundant code in its backpropagation machinery.

null_gradients() was refactored to ensure that only a single traversal of the computational graph is performed to null all of the tensors’ gradients. Furthermore, Tensor.null_gradients(clear_graph=True) now only performs a single graph traversal, instead of two.

In keeping with NumPy’s behavior, performing +x (where x is a mygrad-tensor) no longer returns a reference of x, but returns mygrad.positive(x).

Backpropagation through max() and min() now works for 0D tensors.

Input validation was added to mygrad.nnet.layers.utils.sliding_window_view().

Fixed backpropagation through basic indexing, x[ind] = b, in which broadcasting occurred and b possess “excess” leading singleton dimensions.