Changelog#

This is a record of all past mygrad releases and what went into them, in reverse chronological order. All previous releases should still be available on pip.

2.2.0 - 2023-01-03#

2.1.0 - 2022-01-01#

New Functions and Utilities#

The following differentiable functions are now supported by MyGrad, and “drop-in” overrides for their NumPy counterparts are supported as well.

Basic tensor save/load functionality has been added (thanks to @kw-0).

Improvements#

  • clip() and Tensor.clip now accept an out target, permitting in-place operations.

  • The method Tensor.__index__() is now implemented, which permits scalar integer-valued tensors to be used to index into Python sequences.

  • Added Python 3.10 to our automated test matrix.

Compatibility-Breaking Changes#

  • In accordance with NEP 29 we are dropping support for NumPy versions below 1.19. However, MyGrad will not drop support for Python 3.7; to remain as lightweight and flexible as possible we will support minor versions of Python up until their EOL or until our minimal NumPy dependency drops support – whichever occurs first.

  • The interface to arange() was changed from arange(start, stop=None, step=None, ...) to arange([start,] stop[, step,], ...). This provides exact parity with NumPy’s arange function.

  • The derivatives of absolute() and norm() have been revised such that in cases where the derivatives used to be nan, those entries will now be 0. Both functions can now be passed nan_to_num=False to enable the previous, more rigorous behavior. See PR #379 for more details.

2.0.2 - 2021-04-10#

Exposes execute_op() at top-level namespace

2.0.1 - 2021-04-03#

Bug Fixes#

  • matmul() and multi_matmul() were missing from the top-level namespace of mygrad.

  • A 0D tensor involved in a broadcasted operation would have a numpy-float set for its gradient instead of a 0D array.

New Functions#

The following non-differentiable NumPy functions now work on mygrad tensors (and return ndarrays). Aliases of these are available at the top-level namespace of mygrad

  • np.isnan

  • np.isfinite

  • np.isinf

  • np.isnat

  • np.signbit

  • np.logical_not

  • np.logical_and

  • np.logical_or

  • np.logical_xor

  • np.greater

  • np.greater_equal

  • np.less

  • np.less_equal

  • np.equal

  • np.not_equal

  • np.floor_divide

  • np.remainder

  • np.mod

  • np.fmod

  • np.divmod

  • np.rint

  • np.sign

  • np.floor

  • np.ceil

  • np.trunc

  • np.isclose

2.0.0 - 2021-03-30#

🎉🎉🎉

This is a compatibility-breaking update to MyGrad, and it’s great! MyGrad 2.0 represents a major overhaul to this project. This release creates near parity between the experiences of using MyGrad and using NumPy, and uses NumPy’s new mechanisms for overriding functions so that NumPy functions can operate “directly” on MyGrad’s tensors, and thus can be used to construct differentiable computational graphs!

>>> import numpy as np
>>> from mygrad import tensor
>>> x = tensor([1., 2.])
>>> np.square(x).backward()  # backprop through NumPy functions!
>>> x.grad
array([2., 4.])

Another important, but less exciting, feature is that MyGrad now protects users from inadvertently corrupting the state of a computational graph by, say, mutating a NumPy array that is participating in the graph. This is very useful for protecting people – especially students – from unwittingly poisoning the results of their calculations.

Lastly… no more “nulling” gradients! MyGrad will now handle deleting gradients for you in a way that is nicely compatible with gradient-based optimization work flows.

New Functions and Utilities#

Dropping Support for Python 3.6 and Numpy < 1.17#

MyGrad now abides by the NEP 29 recommendation, and adopts a common “time window-based” policy for support of Python and NumPy versions.

As such the Python 3.7 and Numpy 1.17 are the minimum versions supported by MyGrad 2.0.

The Interfaces Between mygrad.Tensor and numpy.array Match#

You can now control the dimensionality of a tensor and whether or not a tensor copies its data upon initialization, via the tensor() interface. This mirrors the behavior of array()

Numpy

MyGrad 1.X

MyGrad 2.0

>>> np.array([1., 2.], copy=True, ndmin=2)
array([[1., 2.]])
>>> mg.Tensor([1., 2.], copy=True, ndmin=2)
<TypeError>
>>> mg.tensor([1., 2.], copy=True, ndmin=2)
Tensor([[1., 2.]])

Support for dtype, where, and out in ufuncs#

MyGrad now implements ufuncs with support for specifying dtype, boolean masks, and in-place targets. The additional methods, such as mygrad.add.reduce, are not yet implemented.

MyGrad 2.0

>>> mg.add([1, 2],[0, 2], where=[True, False], dtype=float)
Tensor([3., 1.])

Augmented Updates on Tensors Now Match NumPy’s Behavior#

Previously, augmented assignment expressions, such as tensor *= 2, behaved merely as a shorthand for the simple assignment tensor = tensor * 2. This is in stark contrast to the behavior of an augmented assignment on a NumPy array, which mutates the array in-place.

This meant that there was a major discrepancy between how these expressions behaved across MyGrad and NumPy. This has changed in MyGrad 2.0: all augmented assignment expressions operate in-place on tensors and mutate their underlying data.

Numpy

MyGrad 1.X

MyGrad 2.0

>>> x = np.array([1., 2.])
>>> y = x
>>> x *= 2
>>> x is y
True
>>> x = mg.Tensor([1., 2.])
>>> y = x
>>> x *= 2  # x = 2 * x
>>> x is y  # doesn't match!
False
>>> x = mg.tensor([1., 2.])
>>> y = x
>>> x *= 2
>>> x is y  # matches!
True

Creating and Augmenting Views of Tensors#

MyGrad now provides rich support for creating and manipulating views of tensors.

All basic indexing operations performed on a tensor will produce a view of said tensor. This means that these two tensors share memory (While MyGrad 1.X created a view of the underlying NumPy array under the hood for basic indexing, its notion of supporting views went no further than that.) As with NumPy arrays the “parent” of a view can be accessed through the tensor’s .base attribute

Numpy

MyGrad 1.X

MyGrad 2.0

>>> x = np.array([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x
True
>>> x = mg.Tensor([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x  # doesn't match!
<AttributeError>
>>> x = mg.tensor([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x  # matches!
True

Mutating shared data will propagate through views:

Numpy

MyGrad 1.X

MyGrad 2.0

>>> y *= -1
>>> y
array([-1., -2.])
>>> x
array([-1., -2., 3.])
>>> y *= -1
>>> y
Tensor([-1., -2.])
>>> x  # doesn't match!
Tensor([1., 2., 3.])
>>> y *= -1
>>> y
Tensor([-1., -2.])
>>> x  # matches!
Tensor([-1., -2., 3.])

Furthermore, views of tensors now propagate corresponding gradient information as well! This means that if y is a view of x, then y.grad will be a corresponding view of x.grad. This is true for all varieties of views, views of views, etc., of x.

# Because `y` is a view of `x`, `y.grad` will be
# a corresponding view of `x.grad`
>>> (x ** 2).backward()
>>> x.grad
array([-2., -4.,  6.,  8.])
>>> y.grad
array([-2., -4.])
>>> y.grad.base is x.grad
True

This rich support for views, augmented assignments, and in-place updates on tensors enables much more sophisticated operations on tensors now. For example, let’s make a shape-(3, 3) tensor and perform and operations involving views of its diagonal and its anti-diagonal. (Note that einsum() is capable of returning a view of a tensor’s diagonal, and that MyGrad fully supports backpropagation through all flavors of einsum!)

>>> x = mg.tensor([[0., 1., 2.],
...                [3., 4., 5.],
...                [6., 7., 8.]])

# view of diagonal of `x`
>>> diag = mg.einsum("ii->i", x)
>>> diag
Tensor([0., 4., 8.])

# view of anti-diagonal of `x`
>>> anti_diag = mg.einsum("ii->i", x[:, ::-1])
>>> anti_diag
Tensor([2., 4., 6.])

# Compute derivatives of their summed difference
>>> (diag - anti_diag).sum().backward()
>>> x.grad
array([[ 1.,  0., -1.],
       [ 0.,  0.,  0.],
       [-1.,  0.,  1.]])

# The views of `x` have the appropriate corresponding
# views of `x.grad`
>>> diag.grad
array([1., 0., 1.])
>>> anti_diag.grad
array([-1.,  0., -1.])

Bye-Bye Null Gradients!#

Gone are the days of having to manually clear your tensors’ gradients and the computational graph that they were in; now MyGrad does it for you! This means that Tensor.null_gradients() no longer does anything other than emit a deprecation warning. In an upcoming minor release this method will be removed entirely.

In MyGrad 2.0, calling backward() will finish its computation by clearing the computational graph that was involved in the backpropagation. Thus any internally-referenced tensors associated with that computational graph become free for garbage collection. This is very nice behavior to help prevent students from filling up their RAM unwittingly.

And instead of worrying about nulling gradients manually, a tensor will automatically have its gradient cleared any time that it is involved in a new mathematical operation. This enables the following common workflow for performing gradient-based optimization:

MyGrad 1.X

MyGrad 2.0

>>> x = mg.Tensor([1., 2.])
>>> for _ in range(10):
...     y = 3 * x
...     assert x.grad is None
...     y.backward()
...     assert all(x.grad == 3.)
...     y.null_gradients()
>>> x = mg.tensor([1., 2.])
>>> for _ in range(10):
...     y = 3 * x  # nulls grad
...     assert x.grad is None
...     y.backward()
...     assert all(x.grad == 3.)
for _ in range(num_optimization_steps):
    # using `model_params` in a function will automatically
    # set its gradients to `None`
    loss = compute_loss(data, model_params)  # gradients cleared
    loss.backward()         # compute gradients
    optimize(model_params)  # do stuff with gradients

You can also call null_grad() to manually clear an individual tensor’s gradient.

Safety First: Memory Guarding Behavior in MyGrad 2.0#

In MyGrad 1.X it was all too easy to unwittingly corrupt the state of a computational graph by mutating a NumPy array mid-computation. This could lead to incorrect calculations of gradients! This is the stuff of horrifying nightmares.

Now MyGrad tracks all of the arrays that are involved in active computational graphs and locks their memory so that they are read-only (except for when the user mutates the array explicitly with a MyGrad operation). This means that the sort of mutation that could have lurked silently in the dimly-lit alleyways of bugs-ville will now get loudly narc’d on by MyGrad’s merciless memory guard!

MyGrad 1.X

MyGrad 2.0

>>> arr = np.array([1., 2.])
>>> tn = mg.Tensor([1. 1.])
>>> z = x * y
# mutating x will corrupt
# backprop through z...
>>> x[:] = 0.

>>> z.backward() # uh oh...
>>> tn.grad # should be: (1., 2.)
array([0., 0.])
>>> arr = np.array([1., 2.])
>>> tn = mg.tensor([1. 1.])
>>> z = x * y
# mutating x will corrupt
# backprop through z...
>>> x[:] = 0. # you shall not pass!
ValueError: read-only!
>>> z.backward()
>>> tn.grad
array([1., 2.])

Any tensor or array that is no longer participating in an active computational graph will automatically have its write-ability restored to its original state.

# memory guarding is released once an array is no
# longer involved in an active computational graph
>>> import mygrad as mg
>>> import numpy as np
>>> x = np.array([1., 2.])
>>> y = mg.ones_like(x)
>>> z = x * y     # x and y are locked
>>> z.backward()  # graph cleared; x and y are "released"
>>> x[:] = 0      # can write to x
>>> x
array([0., 0.])

# This result is not referenced, thus
# x and y are immediately released by the
# memory-guard; no graph-clearing is needed
>>> x * y
Tensor([0., 0.])
>>> x[:] = 1.

But with great responsibility comes great …uhh… slowness? This memory-guarding feature can lead to slowdowns of up to 50% for computations involving many small tensors (It used to be a lot worse… like 5x worse. I worked really hard to speed it up! I promise!). That being said, computations involving beefy tensors (e.g. standard neural networks) will not be significantly affected by the overhead associated with the memory guard. Please refer to Performance Tips for responsible ways to disable this memory-guarding mechanism.

Speaking of optimizations…

Disabling Automatic Differentiation#

Sometimes you want to use your MyGrad code to do calculations, but you don’t actually need to compute any derivatives. A common example of this is evaluating the test-time performance of a machine learning model that you are in the process of optimizing – you don’t actually need to perform backpropagation when you are processing the test data.

In these circumstances, you can greatly reduce the overhead cost associated with building a computational graph by using the no_autodiff() decorator / context manager. See the linked documentation for extensive examples of its usage.

# demonstrating mygrad in no-autodiff mode
>>> import mygrad as mg
>>> x = mg.Tensor([1., 2., 3., 4.])
>>> with mg.no_autodiff:
...     y = x ** 2  # operation not tracked
>>> y.backward()
>>> y.grad, x.grad  # x is not "connected" to y
(array([1., 1., 1.]), None)

For computations involving many small tensors, this can produce up to a 3x speedup! So make sure you make keen use of this when you don’t actually need to perform autodiff.

Revamping Constant Semantics to be Explicit#

Previously, specifying constant=False in a mygrad function did not actually mean that the function would necessarily produce a non-constant tensor. Rather, it simply meant that the output would not be _forced_ to be a constant – whether or not the result was a constant depended on the inputs (i.e. a function whose inputs were all constants would thus produce a constant).

This was a very bad design decision! Now, specifying constant=False guarantees that the output of a function is a non-constant (meaning that it facilitates backpropagation through a computational graph).

That being said, we usually _do_ want constant information to propagate through functions. Thus constant=None is now the default value – its behavior matches that of constant=False from MyGrad 1.X – for all functions that accept the argument.

It is also now standard to require that this argument be a keyword-only argument.

MyGrad 1.X

MyGrad 2.0

>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)

>>> out = mg.add(t1, t2, constant=False)
>>> out.constant
True
>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)

>>> out = mg.add(t1, t2, constant=False)
>>> out.constant
False

# constant = None
>>> out = mg.add(t1, t2)
>>> out.constant
True
>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)

# old behavior >>> out = mg.add(t1, t2, constant=False) >>> out.constant True

# new behavior >>> out = mg.add(t1, t2, constant=False) >>> out.constant False

>>> out = mg.add(t1, t2, constant=None)
>>> out.constant
True

Remove Scalar-Only Conditions on Backpropagation#

Previously, one could only invoke backpropagation from a non-scalar tensor only if that tensor was the culmination of operations that preserved a one-to-one mapping between the elements of an upstream tensor with its downstream neighbor. Otherwise an error was raised. This ensured that tensor.grad would always be the same shape as tensor, and not represent a higher-dimensional tensor.

Now calling tensor.backward() from a non-scalar tensor will behave as if the tensor was summed prior to invoking backpropagation. This is simple, easy-to-understand behavior, which ensures that tensor.grad can always be interpreted as an array of scalar-valued derivatives.

MyGrad 1.X

MyGrad 2.0

>>> t1 = mg.Tensor([[1., 2.],
...                 [0., -1]])
>>> t2 = mg.Tensor([[0., 1.],
...                 [3., -1]])
>>> z = t1 @ t2
>>> z.backward()
<InvalidBackprop: Scalar-only>
>>> t1 = mg.tensor([[1., 2.],
...                 [0., -1]])
>>> t2 = mg.tensor([[0., 1.],
...                 [3., -1]])
>>> z = t1 @ t2
>>> z.backward()
>>> t1.grad
array([[1., 2.],
       [1., 2.]])

Integer-valued Tensors Are Treated as Constants#

Derivatives involving integer-valued tensors are typically ill-defined, and in MyGrad 1.X they were generally just wrong. Now integer-valued tensors can only be involved in computational graphs as constants.

MyGrad 1.X

MyGrad 2.0

>>> t1 = mg.Tensor([[1, 2]).constant
False
>>> t1 = mg.tensor([[1, 2]]).constant
True

Is This Code Well-Tested?#

Yes! I consider MyGrad’s test suite to be the most important part of the library. It is the only reason why I feel comfortable releasing this code for students, teachers, and others to use. I leverage thorough property-based testing using the Hypothesis library to exercise this code as rigorously as I can manage. These tests even found bugs in NumPy!

Special Thanks#

Special thanks to Alex Silverstein, Zac Dodds, and Petar Griggs for all of the fruitful discussions, ideas, and influence that you provided throughout this major update.

1.9.0 - 2020-08-28#

The most significant aspect of this release is the implementation of Tensor.__array__, which enables a huge amount of cross-compatibility with numpy utilities (#288). Note that any previous reliance of a numpy function to produce an array of tensor-scalars will likely produce a standard numpy array instead.

Improvements:

  • x**1 and x**2 are now special-cased in order to make these common operations more efficient (#266)

  • The derivative of focal_loss() was refactored to handle special edge-cases and the tests for focal loss were improved to exercise these edge cases (#269)

  • Various improvements to the tests (#271, #277, #290, #284, #289, #282, #292, #293)

  • The internal mechanism for tracking tensors in computational graph now depends on hashing tensor-IDs instead of hashing tensors directly. The fact that tensors could be hashed was due to the fact that its equality specialty methods were being monkey-patched (#276)

  • softmax() and logsoftmax() both expose axis arguments (#268)

Bug fixes:

New features:

  • Tensor.__array__ now exposes the tensor’s underlying numpy array – this enables a huge amount of cross-compatibility with numpy utilities (#288)

  • Adds asarray() (#279)

  • Adds astensor() (#294)

1.8.1 - 2020-07-28#

This is an internal change to the backprop mechanism for Tensor.__getitem__, which produces considerable speedups (2x-4x) for backprop through basic indexing and boolean indexing. Thanks to Petar Griggs for finding this.

1.8.0 - 2020-07-25#

New features:

Thanks to Darshan Krishnaswamy and Sam Carpenter for adding this functionality!

Fixes a bug in the GRU layer where mixed floating point precision dtypes between data and weights raised an error. Thanks to Petar Griggs for the fix!

1.7.1 - 2020-07-11#

Fixes a bug in negative_log_likelihood(), where setting constant=True had no effect.

1.7.0 - 2020-07-11#

This release continues the process of integrating functions from mynn.

New features:

Big thanks to David Mascharka!

Improvements:

The interfaces to reshape() and reshape() were adjusted to match exactly the interfaces to their NumPy counterparts. I.e. reshape() now requires newshape to be a sequence, whereas reshape() can accept an unpacked sequence for its newshape.

shape() is now settable - triggering an in-place reshape of a tensor, matching the corresponding behavior in NumPy.

Internal changes:

The logic for writing an in-place operation has been consolidated into a convenient wrapper: _in_place_op().

1.6.0 - 2020-06-21#

New features:

Big thanks to David Mascharka!

1.5.0 - 2020-02-16#

New features:

This update also includes various improvements to the library’s test suite.

1.4.1 - 2020-01-09#

This release performs an internal refactor in the nnet module of the library, as well as an analogous refactor in the test suite. This also fixes a docstring in the multiclass_hinge loss to properly show a description in the readthedocs page.

1.4.0 - 2019-12-19#

This release adds the repeat() operation. It also includes some minor improvements to mygrad’s test suite.

1.3.0 - 2019-11-30#

This release adds clip() and where().

It also includes a major fix to the graph-traversal mechanism for null-gradients and clear-graph, eliminating an exponentially-scaling runtime.

+x will now invoke mygrad.positive, mirroring the numpy behavior

There are improvements to user-facing error messages and input validation in addition to major improvements to mygrad’s test suite. There is now a 100% line-coverage gate in mygrad’s CI system.

1.2.0 - 2019-08-03#

We’re finally keeping a formal changelog!

This release makes substantial improvements to MyGrad’s error-checking and handling, in order to make much simpler the process of debugging issues with buggy custom operations. Specifically, backward() now checks for an invalid-gradients on each call of backward_var(), and raises a descriptive error message.

mygrad.errors was introduced to provide descriptive, MyGrad-specific exceptions. For example, we no longer raise bare exceptions for scenarios like invalid backprop through a scalar-only graph; rather, we now raise a descriptive InvalidBackprop exception.

MyGrad’s testing framework received wide-ranging improvements, yielding complete test coverage and fewer flaky tests. Coverage checks were added to the project’s CI process.

maximum() and minimum() were patched to permit backpropagation through scalar inputs.

Internal implementation details of einsum() were adjusted to remove redundant code in its backpropagation machinery.

null_gradients() was refactored to ensure that only a single traversal of the computational graph is performed to null all of the tensors’ gradients. Furthermore, Tensor.null_gradients(clear_graph=True) now only performs a single graph traversal, instead of two.

In keeping with NumPy’s behavior, performing +x (where x is a mygrad-tensor) no longer returns a reference of x, but returns mygrad.positive(x).

Backpropagation through max() and min() now works for 0D tensors.

Input validation was added to mygrad.nnet.layers.utils.sliding_window_view().

Fixed backpropagation through basic indexing, x[ind] = b, in which broadcasting occurred and b possess “excess” leading singleton dimensions.