I am just playing around a bit with pytorch and have a model which has the following structure:
Layer A - 100 trainable parameters
Layer B - 0 trainable parameters
Layer 3 - 5 trainable parameters
In my forward function, I have something like:
def forward(x, y):
a = layer_a(x)
b = layer_b(a)
loss = layer_c(b, y)
return {"loss": loss}
The layer_b
is simply defined as:
class LayerB(nn.Module):
def __init__(self, params):
super().__init__()
self.params = params
def forward(self, x):
return x.clone()
Now when I run this model, it makes one step through the training process and then crashes with:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead
I am trying to understand why this is happening. Do you think it is necessary to call x.clone().detach()
since no trainable parameters are in that layer? I ask because at least the model training does not crash when I do that.