What happens to the gradients if the output is multiplied by zero?

Suppose, h is the output embedding vector of a network. Now, out of the D dimensions of h, I multiply the last ‘dr’ dimensions with 0 before passing it to the loss function, while keeping the rest of the (D - dr) dimensions unchanged. That is, h = [h1, h2, h3, ... hk, 0, 0, 0, ..., 0, 0]

This is done for all the output embeddings in a batch.

Now, the question is, when the gradients are back-propagated, do the gradients along those ‘dr’ dimensions become zero too?

Hi Siladittya!

Yes (because the gradient of zero is zero).

You can check this with a simple script:

>>> import torch
>>> torch.__version__
'2.2.1'
>>> t = torch.arange (1.0, 11.0, requires_grad = True)
>>> m = torch.ones (10)
>>> m[7:] = 0.0
>>> m
tensor([1., 1., 1., 1., 1., 1., 1., 0., 0., 0.])
>>> (t**2 * m).sum().backward()
>>> t.grad
tensor([ 2.,  4.,  6.,  8., 10., 12., 14.,  0.,  0.,  0.])

Best.

K. Frank