The situation is as follows:
I have 3 blocks of NNs: A, B and C. B is nn.Embedding
I want A’s output will replace only one embedding in B. for instance: B[idx]=A_output
and I want that only A will train (B&C parameters freezes):
for step, batch in enumerate(train_dataloader):
A_output = A(batch['input_1'])
B.weight.data[index] = torch.zeros(A_output.shape)
B.weight[idx] = B.weight[idx] + A_output
B_output = B(batch['input_2'])
C_output = C(B_output)
loss = F.mse_loss(C_output, batch['target'], reduction="mean")
loss.backward(retain_graph=True)
optimizer.step()
optimizer.zero_grad()
I’ve got:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [512, 768]], which is output 0 of AsStridedBackward0, is at version 3; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Any ideas?
Thanks