What is an inplace operation?

Not sure, if this could cause the issue, but could you call the out of place .zero() operation of weight.new() instead the inplace .zero_()?

Inplace operation act on the tensor directly without creating a new result tensor and have a trailing underscore.
This code snippet shows the usage of inplace operations:

x = torch.zeros(10)
x[0] = 1.
x.sigmoid_()
print(x)

As you can see, I don’t need to assign a new variable to x.sigmoid_(), as it will apply the sigmoid directly on x.
However, since intermediate activations are often needed to calculate the gradients, inplace operations might create errors during the backward call and should then be removed.

EDIT: Seems to have been solved here.