Backpropagation is leaking when using einsum with onehot encoded vector

hiru · October 26, 2022, 10:28pm

I have a stack of feature maps as given below. (my batch size is 32 and I have 64 feature maps with each with size 64x64)

out0 = self.selection[0](x)#output=[32,64,64,64]
out1 = self.selection[1](x)#output=[32,64,64,64]
out2 = self.selection[2](x)#output=[32,64,64,64]

out=torch.stack([out0,out1,out2]).permute(1, 0, 2, 3, 4)#output=[32,3,64,64,64]

and I am selecting feature maps from the second dimension (3) using a one-hot encoded vector (y, 32x3) by using torch.einsum as given below.

out=torch.einsum("x y, x y c d e -> x c d e",y,out)

Even though the model trains, when I tried to check the backpropagation updates for out0,out1,out2, by providing data only for out0 (basically a one hot encoder tensor consists of only 1,0,0) however the optimizer.step() is upgrading the weights in layers corresponds to out1,out2.(which shouldn’t be the case they are basically multiplying by zero every time.)

Anyone ever had the same problem?

KFrank · October 29, 2022, 5:02pm

Hi Hiru!

Please note that optimizers do update tensors with zero gradients if they
are using weight decay or momentum (that is already non-zero).

Try your experiment using plain-vanilla SGD without weight decay or
momentum. If you still see this result, please post a simplified, complete,
runnable script that illustrates your issue, together with its output.

Best.

K. Frank

hiru · November 13, 2022, 9:14pm

It worked. I totally forgot about the weight decay.
Thanks!