# Changelog#

This is a record of all past mygrad releases and what went into them, in reverse chronological order. All previous releases should still be available on pip.

## 2.2.0 - 2023-01-03#

MyGrad is now tested against Python 3.11. (pull request #411)

`mygrad.bool8`

has been removed. Use`mygrad.bool_`

instead. (pull request #411)Adds ufunc support for

`resolve_dtypes`

. (pull request #411)Modifies automatic differentiation framework to be simpler and more memory efficient. In the future, MyGrad will be able to expose an API akin to torch.autograd.grad. (pull request #407)

MyGrad’s CI now enforces formatting and spell check requirements on all pull requests. (pull request #411)

## 2.1.0 - 2022-01-01#

### New Functions and Utilities#

The following differentiable functions are now supported by MyGrad, and “drop-in” overrides for their NumPy counterparts are supported as well.

Basic tensor save/load functionality has been added (thanks to @kw-0).

### Improvements#

`clip()`

and`Tensor.clip`

now accept an`out`

target, permitting in-place operations.The method

`Tensor.__index__()`

is now implemented, which permits scalar integer-valued tensors to be used to index into Python sequences.Added Python 3.10 to our automated test matrix.

### Compatibility-Breaking Changes#

In accordance with NEP 29 we are dropping support for NumPy versions below 1.19. However, MyGrad will not drop support for Python 3.7; to remain as lightweight and flexible as possible we will support minor versions of Python up until their EOL or until our minimal NumPy dependency drops support – whichever occurs first.

The interface to

`arange()`

was changed from`arange(start, stop=None, step=None, ...)`

to`arange([start,] stop[, step,], ...)`

. This provides exact parity with NumPy’s arange function.The derivatives of

`absolute()`

and`norm()`

have been revised such that in cases where the derivatives used to be`nan`

, those entries will now be`0`

. Both functions can now be passed`nan_to_num=False`

to enable the previous, more rigorous behavior. See PR #379 for more details.

## 2.0.2 - 2021-04-10#

Exposes `execute_op()`

at top-level namespace

## 2.0.1 - 2021-04-03#

### Bug Fixes#

`matmul()`

and`multi_matmul()`

were missing from the top-level namespace of`mygrad`

.A 0D tensor involved in a broadcasted operation would have a numpy-float set for its gradient instead of a 0D array.

### New Functions#

The following non-differentiable NumPy functions now work on mygrad tensors (and return ndarrays).
Aliases of these are available at the top-level namespace of `mygrad`

np.isnan

np.isfinite

np.isinf

np.isnat

np.signbit

np.logical_not

np.logical_and

np.logical_or

np.logical_xor

np.greater

np.greater_equal

np.less

np.less_equal

np.equal

np.not_equal

np.floor_divide

np.remainder

np.mod

np.fmod

np.divmod

np.rint

np.sign

np.floor

np.ceil

np.trunc

np.isclose

## 2.0.0 - 2021-03-30#

🎉🎉🎉

This is a compatibility-breaking update to MyGrad, and it’s great! MyGrad 2.0 represents a major overhaul to this project. This release creates near parity between the experiences of using MyGrad and using NumPy, and uses NumPy’s new mechanisms for overriding functions so that NumPy functions can operate “directly” on MyGrad’s tensors, and thus can be used to construct differentiable computational graphs!

```
>>> import numpy as np
>>> from mygrad import tensor
>>> x = tensor([1., 2.])
>>> np.square(x).backward() # backprop through NumPy functions!
>>> x.grad
array([2., 4.])
```

Another important, but less exciting, feature is that MyGrad now protects users from inadvertently corrupting the state of a computational graph by, say, mutating a NumPy array that is participating in the graph. This is very useful for protecting people – especially students – from unwittingly poisoning the results of their calculations.

Lastly… no more “nulling” gradients! MyGrad will now handle deleting gradients for you in a way that is nicely compatible with gradient-based optimization work flows.

### New Functions and Utilities#

`turn_memory_guarding_on()`

### Dropping Support for Python 3.6 and Numpy < 1.17#

MyGrad now abides by the NEP 29 recommendation, and adopts a common “time window-based” policy for support of Python and NumPy versions.

As such the Python 3.7 and Numpy 1.17 are the minimum versions supported by MyGrad 2.0.

### The Interfaces Between `mygrad.Tensor`

and `numpy.array`

Match#

You can now control the dimensionality of a tensor and whether or not a tensor copies its data upon initialization, via the
`tensor()`

interface. This mirrors the behavior of `array()`

Numpy |
MyGrad 1.X |
MyGrad 2.0 |
---|---|---|

```
>>> np.array([1., 2.], copy=True, ndmin=2)
array([[1., 2.]])
``` |
```
>>> mg.Tensor([1., 2.], copy=True, ndmin=2)
<TypeError>
``` |
```
>>> mg.tensor([1., 2.], copy=True, ndmin=2)
Tensor([[1., 2.]])
``` |

### Support for dtype, where, and out in ufuncs#

MyGrad now implements ufuncs with support for specifying dtype, boolean masks, and in-place targets. The
additional methods, such as `mygrad.add.reduce`

, are not yet implemented.

MyGrad 2.0 |
---|

```
>>> mg.add([1, 2],[0, 2], where=[True, False], dtype=float)
Tensor([3., 1.])
``` |

### Augmented Updates on Tensors Now Match NumPy’s Behavior#

Previously, augmented assignment expressions, such as `tensor *= 2`

, behaved merely
as a shorthand for the simple assignment `tensor = tensor * 2`

.
This is in stark contrast to the behavior of an augmented assignment on a NumPy array, which
mutates the array in-place.

This meant that there was a major discrepancy between how these expressions behaved across MyGrad and NumPy. This has changed in MyGrad 2.0: all augmented assignment expressions operate in-place on tensors and mutate their underlying data.

Numpy |
MyGrad 1.X |
MyGrad 2.0 |
---|---|---|

```
>>> x = np.array([1., 2.])
>>> y = x
>>> x *= 2
>>> x is y
True
``` |
```
>>> x = mg.Tensor([1., 2.])
>>> y = x
>>> x *= 2 # x = 2 * x
>>> x is y # doesn't match!
False
``` |
```
>>> x = mg.tensor([1., 2.])
>>> y = x
>>> x *= 2
>>> x is y # matches!
True
``` |

### Creating and Augmenting Views of Tensors#

MyGrad now provides rich support for creating and manipulating views of tensors.

All basic indexing operations
performed on a tensor will produce a view of said tensor.
This means that these two tensors share memory
(While MyGrad 1.X created a view of the underlying NumPy array under the hood for basic indexing, its notion
of supporting views went no further than that.)
As with NumPy arrays the “parent” of a view can be accessed through the tensor’s `.base`

attribute

Numpy |
MyGrad 1.X |
MyGrad 2.0 |
---|---|---|

```
>>> x = np.array([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x
True
``` |
```
>>> x = mg.Tensor([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x # doesn't match!
<AttributeError>
``` |
```
>>> x = mg.tensor([1., 2., 3.])
>>> y = x[:2]
>>> np.shares_memory(x, y)
True
>>> y.base is x # matches!
True
``` |

Mutating shared data will propagate through views:

Numpy |
MyGrad 1.X |
MyGrad 2.0 |
---|---|---|

```
>>> y *= -1
>>> y
array([-1., -2.])
>>> x
array([-1., -2., 3.])
``` |
```
>>> y *= -1
>>> y
Tensor([-1., -2.])
>>> x # doesn't match!
Tensor([1., 2., 3.])
``` |
```
>>> y *= -1
>>> y
Tensor([-1., -2.])
>>> x # matches!
Tensor([-1., -2., 3.])
``` |

Furthermore, views of tensors now propagate corresponding gradient information as well!
This means that if `y`

is a view of `x`

, then `y.grad`

will be a corresponding view of `x.grad`

.
This is true for all varieties of views, views of views, etc., of `x`

.

```
# Because `y` is a view of `x`, `y.grad` will be
# a corresponding view of `x.grad`
>>> (x ** 2).backward()
>>> x.grad
array([-2., -4., 6., 8.])
>>> y.grad
array([-2., -4.])
>>> y.grad.base is x.grad
True
```

This rich support for views, augmented assignments, and in-place updates on tensors enables much more sophisticated
operations on tensors now.
For example, let’s make a shape-(3, 3) tensor and perform and operations involving views of its diagonal and
its anti-diagonal. (Note that `einsum()`

is capable of returning a view of a tensor’s diagonal,
and that MyGrad fully supports backpropagation through all flavors of einsum!)

```
>>> x = mg.tensor([[0., 1., 2.],
... [3., 4., 5.],
... [6., 7., 8.]])
# view of diagonal of `x`
>>> diag = mg.einsum("ii->i", x)
>>> diag
Tensor([0., 4., 8.])
# view of anti-diagonal of `x`
>>> anti_diag = mg.einsum("ii->i", x[:, ::-1])
>>> anti_diag
Tensor([2., 4., 6.])
# Compute derivatives of their summed difference
>>> (diag - anti_diag).sum().backward()
>>> x.grad
array([[ 1., 0., -1.],
[ 0., 0., 0.],
[-1., 0., 1.]])
# The views of `x` have the appropriate corresponding
# views of `x.grad`
>>> diag.grad
array([1., 0., 1.])
>>> anti_diag.grad
array([-1., 0., -1.])
```

### Bye-Bye Null Gradients!#

Gone are the days of having to manually clear your tensors’ gradients and the computational graph that they were
in; now MyGrad does it for you!
This means that `Tensor.null_gradients()`

no longer does anything other than emit a deprecation warning.
In an upcoming minor release this method will be removed entirely.

In MyGrad 2.0, calling `backward()`

will finish its computation by clearing the computational graph that was involved
in the backpropagation.
Thus any internally-referenced tensors associated with that computational graph become free for garbage collection.
This is very nice behavior to help prevent students from filling up their RAM unwittingly.

And instead of worrying about nulling gradients manually, a tensor will automatically have its gradient cleared any time that it is involved in a new mathematical operation. This enables the following common workflow for performing gradient-based optimization:

MyGrad 1.X |
MyGrad 2.0 |
---|---|

```
>>> x = mg.Tensor([1., 2.])
>>> for _ in range(10):
... y = 3 * x
... assert x.grad is None
... y.backward()
... assert all(x.grad == 3.)
... y.null_gradients()
``` |
```
>>> x = mg.tensor([1., 2.])
>>> for _ in range(10):
... y = 3 * x # nulls grad
... assert x.grad is None
... y.backward()
... assert all(x.grad == 3.)
``` |

```
for _ in range(num_optimization_steps):
# using `model_params` in a function will automatically
# set its gradients to `None`
loss = compute_loss(data, model_params) # gradients cleared
loss.backward() # compute gradients
optimize(model_params) # do stuff with gradients
```

You can also call `null_grad()`

to manually clear an individual tensor’s gradient.

### Safety First: Memory Guarding Behavior in MyGrad 2.0#

In MyGrad 1.X it was all too easy to unwittingly corrupt the state of a computational graph by mutating a NumPy array mid-computation. This could lead to incorrect calculations of gradients! This is the stuff of horrifying nightmares.

Now MyGrad tracks all of the arrays that are involved in active computational graphs and locks their memory so that they are read-only (except for when the user mutates the array explicitly with a MyGrad operation). This means that the sort of mutation that could have lurked silently in the dimly-lit alleyways of bugs-ville will now get loudly narc’d on by MyGrad’s merciless memory guard!

MyGrad 1.X |
MyGrad 2.0 |
---|---|

```
>>> arr = np.array([1., 2.])
>>> tn = mg.Tensor([1. 1.])
>>> z = x * y
# mutating x will corrupt
# backprop through z...
>>> x[:] = 0.
>>> z.backward() # uh oh...
>>> tn.grad # should be: (1., 2.)
array([0., 0.])
``` |
```
>>> arr = np.array([1., 2.])
>>> tn = mg.tensor([1. 1.])
>>> z = x * y
# mutating x will corrupt
# backprop through z...
>>> x[:] = 0. # you shall not pass!
ValueError: read-only!
>>> z.backward()
>>> tn.grad
array([1., 2.])
``` |

Any tensor or array that is no longer participating in an active computational graph will automatically have its write-ability restored to its original state.

```
# memory guarding is released once an array is no
# longer involved in an active computational graph
>>> import mygrad as mg
>>> import numpy as np
>>> x = np.array([1., 2.])
>>> y = mg.ones_like(x)
>>> z = x * y # x and y are locked
>>> z.backward() # graph cleared; x and y are "released"
>>> x[:] = 0 # can write to x
>>> x
array([0., 0.])
# This result is not referenced, thus
# x and y are immediately released by the
# memory-guard; no graph-clearing is needed
>>> x * y
Tensor([0., 0.])
>>> x[:] = 1.
```

But with great responsibility comes great …uhh… slowness? This memory-guarding feature can lead to slowdowns
of **up to 50% for computations involving many small tensors**
(It used to be **a lot** worse… like 5x worse. I worked really hard to speed it up! I promise!).
That being said, computations involving beefy tensors (e.g. standard neural networks) will not be significantly
affected by the overhead associated with the memory guard.
Please refer to Performance Tips for responsible ways to disable this memory-guarding mechanism.

Speaking of optimizations…

### Disabling Automatic Differentiation#

Sometimes you want to use your MyGrad code to do calculations, but you don’t actually need to compute any derivatives. A common example of this is evaluating the test-time performance of a machine learning model that you are in the process of optimizing – you don’t actually need to perform backpropagation when you are processing the test data.

In these circumstances, you can greatly reduce the overhead cost associated with building a computational
graph by using the `no_autodiff()`

decorator / context manager. See the linked documentation
for extensive examples of its usage.

```
# demonstrating mygrad in no-autodiff mode
>>> import mygrad as mg
>>> x = mg.Tensor([1., 2., 3., 4.])
>>> with mg.no_autodiff:
... y = x ** 2 # operation not tracked
>>> y.backward()
>>> y.grad, x.grad # x is not "connected" to y
(array([1., 1., 1.]), None)
```

For computations involving many small tensors, this can produce **up to a 3x speedup**! So make sure you
make keen use of this when you don’t actually need to perform autodiff.

### Revamping Constant Semantics to be Explicit#

Previously, specifying `constant=False`

in a mygrad function did not actually mean
that the function would necessarily produce a non-constant tensor. Rather, it simply
meant that the output would not be _forced_ to be a constant – whether or not the result
was a constant depended on the inputs (i.e. a function whose inputs were all constants
would thus produce a constant).

This was a very bad design decision! Now, specifying `constant=False`

guarantees that
the output of a function is a non-constant (meaning that it facilitates backpropagation
through a computational graph).

That being said, we usually _do_ want constant information to propagate through functions.
Thus `constant=None`

is now the default value – its behavior matches that of `constant=False`

from MyGrad 1.X – for all functions that accept the argument.

It is also now standard to require that this argument be a keyword-only argument.

MyGrad 1.X |
MyGrad 2.0 |
---|---|

```
>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)
>>> out = mg.add(t1, t2, constant=False)
>>> out.constant
True
``` |
```
>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)
>>> out = mg.add(t1, t2, constant=False)
>>> out.constant
False
# constant = None
>>> out = mg.add(t1, t2)
>>> out.constant
True
``` |

```
>>> t1 = mg.tensor(1., constant=True)
>>> t2 = mg.tensor(1., constant=True)
```

# old behavior >>> out = mg.add(t1, t2, constant=False) >>> out.constant True

# new behavior >>> out = mg.add(t1, t2, constant=False) >>> out.constant False

```
>>> out = mg.add(t1, t2, constant=None)
>>> out.constant
True
```

### Remove Scalar-Only Conditions on Backpropagation#

Previously, one could only invoke backpropagation from a non-scalar tensor only if that tensor was
the culmination of operations that preserved a one-to-one mapping between the elements of an upstream
tensor with its downstream neighbor. Otherwise an error was raised. This ensured that `tensor.grad`

would always be the same shape as `tensor`

, and not represent a higher-dimensional tensor.

Now calling `tensor.backward()`

from a non-scalar tensor will behave as if the tensor was summed prior
to invoking backpropagation. This is simple, easy-to-understand behavior, which ensures that `tensor.grad`

can always be interpreted as an array of scalar-valued derivatives.

MyGrad 1.X |
MyGrad 2.0 |
---|---|

```
>>> t1 = mg.Tensor([[1., 2.],
... [0., -1]])
>>> t2 = mg.Tensor([[0., 1.],
... [3., -1]])
>>> z = t1 @ t2
>>> z.backward()
<InvalidBackprop: Scalar-only>
``` |
```
>>> t1 = mg.tensor([[1., 2.],
... [0., -1]])
>>> t2 = mg.tensor([[0., 1.],
... [3., -1]])
>>> z = t1 @ t2
>>> z.backward()
>>> t1.grad
array([[1., 2.],
[1., 2.]])
``` |

### Integer-valued Tensors Are Treated as Constants#

Derivatives involving integer-valued tensors are typically ill-defined, and in MyGrad 1.X they were generally just wrong. Now integer-valued tensors can only be involved in computational graphs as constants.

MyGrad 1.X |
MyGrad 2.0 |
---|---|

```
>>> t1 = mg.Tensor([[1, 2]).constant
False
``` |
```
>>> t1 = mg.tensor([[1, 2]]).constant
True
``` |

### Is This Code Well-Tested?#

Yes! I consider MyGrad’s test suite to be the most important part of the library. It is the only reason why I feel comfortable releasing this code for students, teachers, and others to use. I leverage thorough property-based testing using the Hypothesis library to exercise this code as rigorously as I can manage. These tests even found bugs in NumPy!

### Special Thanks#

Special thanks to Alex Silverstein, Zac Dodds, and Petar Griggs for all of the fruitful discussions, ideas, and influence that you provided throughout this major update.

## 1.9.0 - 2020-08-28#

The most significant aspect of this release is the implementation of `Tensor.__array__`

, which enables a huge amount
of cross-compatibility with numpy utilities (#288). Note that any previous
reliance of a numpy function to produce an array of tensor-scalars will likely produce a standard numpy array instead.

Improvements:

`x**1`

and`x**2`

are now special-cased in order to make these common operations more efficient (#266)The derivative of

`focal_loss()`

was refactored to handle special edge-cases and the tests for focal loss were improved to exercise these edge cases (#269)Various improvements to the tests (#271, #277, #290, #284, #289, #282, #292, #293)

The internal mechanism for tracking tensors in computational graph now depends on hashing tensor-IDs instead of hashing tensors directly. The fact that tensors could be hashed was due to the fact that its equality specialty methods were being monkey-patched (#276)

`softmax()`

and`logsoftmax()`

both expose`axis`

arguments (#268)

Bug fixes:

0D tensors could not be indexed into – e.g. to insert a newaxis (#273)

There was a potential numerical instability in

`mygrad.nnet.layers.batchnorm()`

(#285)The

`dtype`

argument in`Tensor.__init__`

was ignored when the array-like argument, x, was another Tensor-instance (#294)

New features:

`Tensor.__array__`

now exposes the tensor’s underlying numpy array – this enables a huge amount of cross-compatibility with numpy utilities (#288)Adds

`astensor()`

(#294)

## 1.8.1 - 2020-07-28#

This is an internal change to the backprop
mechanism for `Tensor.__getitem__`

, which produces considerable speedups (2x-4x) for backprop
through basic indexing and boolean indexing. Thanks to Petar Griggs for finding this.

## 1.8.0 - 2020-07-25#

New features:

Adds

`any()`

and`any()`

Adds

`rand()`

Adds

`randint()`

Adds

`randn()`

Adds

`random()`

Adds

`random_integers()`

Adds

`random_sample()`

Adds

`ranf()`

Adds

`sample()`

Adds

`seed()`

Thanks to Darshan Krishnaswamy and Sam Carpenter for adding this functionality!

Fixes a bug in the GRU layer where mixed floating point precision dtypes between data and weights raised an error. Thanks to Petar Griggs for the fix!

## 1.7.1 - 2020-07-11#

Fixes a bug in `negative_log_likelihood()`

, where setting `constant=True`

had no effect.

## 1.7.0 - 2020-07-11#

This release continues the process of integrating functions from mynn.

New features:

Adds

`glorot_normal()`

Adds

`glorot_uniform()`

Adds

`he_normal()`

Adds

`he_uniform()`

Adds

`normal()`

Adds

`uniform()`

Adds

`focal_loss()`

Big thanks to David Mascharka!

Improvements:

The interfaces to `reshape()`

and `reshape()`

were adjusted to match exactly the interfaces to their NumPy counterparts.
I.e. `reshape()`

now requires `newshape`

to be a sequence, whereas `reshape()`

can accept an unpacked sequence for its
`newshape`

.

`shape()`

is now settable - triggering an in-place reshape of a tensor, matching the corresponding behavior in NumPy.

Internal changes:

The logic for writing an in-place operation has been consolidated into a convenient wrapper: `_in_place_op()`

.

## 1.6.0 - 2020-06-21#

New features:

Adds

`elu()`

Adds

`glu()`

Adds

`leaky_relu()`

Adds

`selu()`

Adds

`soft_sign()`

Big thanks to David Mascharka!

## 1.5.0 - 2020-02-16#

New features:

Adds

`astype()`

method.Adds

`hard_tanh()`

`y_true`

can now be passed as a`Tensor`

to`softmax_crossentropy()`

This update also includes various improvements to the library’s test suite.

## 1.4.1 - 2020-01-09#

This release performs an internal refactor in the `nnet`

module of the library, as well as
an analogous refactor in the test suite. This also fixes a docstring in the `multiclass_hinge`

loss to properly show a description in the readthedocs page.

## 1.4.0 - 2019-12-19#

This release adds the `repeat()`

operation. It also includes some minor
improvements to mygrad’s test suite.

## 1.3.0 - 2019-11-30#

This release adds `clip()`

and `where()`

.

It also includes a major fix to the graph-traversal mechanism for null-gradients and clear-graph, eliminating an exponentially-scaling runtime.

`+x`

will now invoke `mygrad.positive`

, mirroring the numpy behavior

There are improvements to user-facing error messages and input validation in addition to major improvements to mygrad’s test suite. There is now a 100% line-coverage gate in mygrad’s CI system.

## 1.2.0 - 2019-08-03#

We’re finally keeping a formal changelog!

This release makes substantial improvements to MyGrad’s error-checking and handling, in order to make much simpler the process of debugging issues with buggy custom operations. Specifically, `backward()`

now checks for an invalid-gradients on each call of `backward_var()`

, and raises a descriptive error message.

`mygrad.errors`

was introduced to provide descriptive, MyGrad-specific exceptions. For example, we no longer raise bare exceptions for scenarios like invalid backprop through a scalar-only graph; rather, we now raise a descriptive `InvalidBackprop`

exception.

MyGrad’s testing framework received wide-ranging improvements, yielding complete test coverage and fewer flaky tests. Coverage checks were added to the project’s CI process.

`maximum()`

and `minimum()`

were patched to permit backpropagation through scalar inputs.

Internal implementation details of `einsum()`

were adjusted to remove redundant code in its backpropagation machinery.

`null_gradients()`

was refactored to ensure that only a single traversal of the computational graph is performed to null all of the tensors’ gradients. Furthermore, Tensor.null_gradients(clear_graph=True) now only performs a single graph traversal, instead of two.

In keeping with NumPy’s behavior, performing +x (where x is a mygrad-tensor) no longer returns a reference of x, but returns mygrad.positive(x).

Backpropagation through `max()`

and `min()`

now works for 0D tensors.

Input validation was added to `mygrad.nnet.layers.utils.sliding_window_view()`

.

Fixed backpropagation through basic indexing, x[ind] = b, in which broadcasting occurred and b possess “excess” leading singleton dimensions.