mygrad.nnet.initializers.he_normal#

mygrad.nnet.initializers.he_normal(*shape, gain=1, dtype=<class 'numpy.float32'>, constant=None)[source]#

Initialize a mygrad.Tensor according to the normal initialization procedure described by He et al.

Parameters

shapeSequence[int]

The shape of the output Tensor. Note that shape must be at least two-dimensional.

gainReal, optional (default=1)

The gain (scaling factor) to apply.

dtypedata-type, optional (default=float32)

The data type of the output tensor; must be a floating-point type.

constantbool, optional (default=False)

If True, the returned tensor is a constant (it: does not back-propagate a gradient).

Returns

mygrad.Tensor, shape=``shape``: A Tensor, with values initialized according to the He normal initialization.

Notes

He, Zhang, Ren, and Sun put forward this initialization in the paper: “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”

https://arxiv.org/abs/1502.01852

A Tensor \(W\) initialized in this way should be drawn from a distribution about

\[\mathcal{N}(0, \sqrt{\frac{2}{(1+a^2)n_l}})\]

where \(a\) is the slope of the rectifier following this layer, which is incorporated using the gain variable above.

The guidance put forward in that paper is that this initialization procedure should be preferred over the mygrad.nnet.initializers.glorot_* functions especially when rectifiers (e.g. ReLU, PReLU, leaky_relu) in very deep (> 1-20 or so layer) networks.

Examples

>>> from mygrad.nnet.initializers import he_normal
>>> he_normal(2, 3)
Tensor([[-2.3194842 ,  0.45956254, -0.28709933],
        [-0.15776408,  0.6777564 , -0.05587448]], dtype=float32)

>>> he_normal(4, 2, gain=5/3, dtype="float64", constant=True)
Tensor([[ 0.25962918,  1.1503933 ],
        [-0.13638746,  0.10581096],
        [ 1.44805926,  0.51367645],
        [-0.32018705, -0.80306442]])

>>> he_normal(2, 1, 2, dtype="float16")
Tensor([[[ 0.8057 , -0.2922 ]],
        [[ 0.12213, -0.715  ]]], dtype=float16)