Not sure, if this could cause the issue, but could you call the out of place .zero()
operation of weight.new()
instead the inplace .zero_()
?
Inplace operation act on the tensor directly without creating a new result tensor and have a trailing underscore.
This code snippet shows the usage of inplace operations:
x = torch.zeros(10)
x[0] = 1.
x.sigmoid_()
print(x)
As you can see, I don’t need to assign a new variable to x.sigmoid_()
, as it will apply the sigmoid directly on x
.
However, since intermediate activations are often needed to calculate the gradients, inplace operations might create errors during the backward call and should then be removed.
EDIT: Seems to have been solved here.