Hi,
I don’t understand how to deal with the in-palce operations while guarantee autograd happened properly. In my code, I first define 3 mlps with the same structure, then using them to modify tensors.
I’m using tensor.reshape
to avoid creating new tensors, but in-place operations keeps hapenning.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [5, 2, 20]], which is output 0 of torch::autograd::CopySlices, is at version 20; expected version 10 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
def forward(self, F_ue, Mesg_ue, E, P, Noise):
'''
:param F_ue: of size [2MN,K], randomly generated at the first iteration
:param Mesg_ue: of size[K,M,2MN], randomly generated at the first iteration
:param E: of size [M,K,2MN]
:param P: of size [M,1]
:param Noise: of size [K,1]
:return: F_ue_update, agg_ue
'''
M = P.size()
M = M[0]
K = Noise.size()
K = K[0]
N = E.size(dim=2)//2
agg_ue = torch.mean(Mesg_ue, dim=0) # of size [M,2MN]
Mesg_bs = Mesg_ue.reshape(M,K,2*M*N)
for m in range(M):
for k in range(K):
Mesg_bs[m,k,:] = self.mlp1(E[m,k,:], agg_ue[m,:], P[m], Noise[k]) # it seems like in-place operation happens here
# Mesg_bs <CopySlices object at 0x2b410bc78c10>
agg_bs = torch.mean(Mesg_bs, dim=0) # of size [K,2MN]
Mesg_ue = Mesg_bs.reshape(K,M,2*M*N)
for k in range(K):
for m in range(M):
Mesg_ue[k,m,:] = self.mlp2(F_ue[:,k], E[m,k,:], agg_bs[k,:], P[m], Noise[k])
# Mesg_ue <AsStridedBackward0 object at 0x2b410bc78c70>
for k in range(K):
E_cat = torch.cat([E[m, k, :] for m in range(M)], dim=0)
F_ue[:,k] = self.mlp3(E_cat, F_ue[:,k], agg_bs[k,:], P[0], Noise[k])
# F_ue <CopySlices object at 0x2b410bc78c10>
return F_ue, Mesg_ue
F_ue0 = torch.rand(2 * M * N, K, requires_grad = True)
# F_ue0 <ToCopyBackward0 object at 0x2b410bc78be0>
Mesg_ue0 = torch.rand(K, M, 2 * M * N, requires_grad = True)
# Mesg_ue <ToCopyBackward0 object at 0x2b410bc78be0>
F_ue, Mesg_ue = model(F_ue0, Mesg_ue0, E, P, Noise)
loss = Loss(F_ue)
loss.backward()
Also, since only F_ue is used in calculating loss, I’m curious about whether the parameters of mlp1 and mlp2 will be updated or not.
Thanks for any help.