Views and In-Place Operations#

Producing a “View” of a Tensor#

MyGrad’s tensors exhibit the same view semantics and memory-sharing relationships as NumPy arrays. I.e. any (non-scalar) tensor produced via basic indexing will share memory with its parent.

>>> x = mg.tensor([1., 2., 3., 4.])
>>> y = x[:2]  # the view: Tensor([1., 2.])
>>> y.base is x
>>> np.shares_memory(x, y)

Mutating shared data will propagate through views:

>>> y *= -1
>>> x
Tensor([-1., -2.,  3.,  4.])
>>> y
Tensor([-1., -2.])

And this view relationship will also manifest between the tensors’ gradients

>>> (x ** 2).backward()
>>> x.grad
array([-2., -4.,  6.,  8.])
>>> y.grad
array([-2., -4.])

In-Place Operations are not Efficient#

It is important to note that although MyGrad’s view semantics promote a rich parity with NumPy, certain aspects should be avoided in the interest of optimized performance. Namely, performing in-place operations on tensors is generally not more efficient than their non-mutating counterparts.

This is because MyGrad has to track the state of tensors that are involved in a computational graph. Thus a mutated tensor must have its pre-augmented state stored for future reference; this defeats the performance benefit of writing to an array’s memory in-place. This is especially inefficient if you are mutating a tensor involved with multiple views of the same memory( By contrast, producing a view of a tensor is efficient as one would expect).

Thus these NumPy-like in-place semantics are supported by MyGrad not for the same performance purposes, but instead to support convenient and familiar code-patterns and to enable one to port NumPy code to MyGrad (or, in the future, inject MyGrad tensors into NumPy!!) and get the exact same behavior.

A final note: MyGrad’s in-place operations, when run under no_autodiff() mode, do not incur the extra costs noted above, and thus your code will benefit from the performance benefits of in-place operations.