mygrad.nnet.activations.softmax#

mygrad.nnet.activations.softmax(x: ArrayLike, axis: Union[None, int, Tuple[int, ...]] = -1, *, constant: Optional[bool] = None) Tensor[source]#

Applies the softmax activation function:

f(x) = exp(x) / sum( exp(x) )

Computes the softmax over one or more axes of an ND-tensor.

Parameters
xarray_like
axisUnion[None, int, Tuple[int, …]], optional (default=-1)

The axis/axes over which to compute the softmax. By default, the softmax is computed over the trailing axis.

constantbool, optional(default=False)

If True, the returned tensor is a constant (it does not back-propagate a gradient)

Returns
mygrad.Tensor

Notes

  • \(N\) is the number of samples in the batch.

  • \(C\) is the number of possible classes for which scores are provided.

This implements a numerically-stable version of softmax, however log-softmax is still the more numerically stable activation function.

Given the shape-\((N, C)\) tensor of scores, x, the softmax classification probabilities are computed. That is, the score for class-\(k\) of a given datum (\(s_{k}\)) is normalized using the ‘softmax’ transformation:

\[p_{k} = \frac{e^{s_k}}{\sum_{i=1}^{C}{e^{s_i}}}\]

Examples

>>> import mygrad as mg
>>> from mygrad.nnet import softmax
>>> x = mg.Tensor([[ 2.,  2.,  2.],
...                [2E50, 2E50,  1E50]])
>>> softmax(x)
Tensor([[0.33333333, 0.33333333, 0.33333333],
        [0.5       , 0.5       , 0.        ]])