I don’t understand how to deal with the in-palce operations while guarantee autograd happened properly. In my code, I first define 3 mlps with the same structure, then using them to modify tensors.
tensor.reshape to avoid creating new tensors, but in-place operations keeps hapenning.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [5, 2, 20]], which is output 0 of torch::autograd::CopySlices, is at version 20; expected version 10 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
def forward(self, F_ue, Mesg_ue, E, P, Noise): ''' :param F_ue: of size [2MN,K], randomly generated at the first iteration :param Mesg_ue: of size[K,M,2MN], randomly generated at the first iteration :param E: of size [M,K,2MN] :param P: of size [M,1] :param Noise: of size [K,1] :return: F_ue_update, agg_ue ''' M = P.size() M = M K = Noise.size() K = K N = E.size(dim=2)//2 agg_ue = torch.mean(Mesg_ue, dim=0) # of size [M,2MN] Mesg_bs = Mesg_ue.reshape(M,K,2*M*N) for m in range(M): for k in range(K): Mesg_bs[m,k,:] = self.mlp1(E[m,k,:], agg_ue[m,:], P[m], Noise[k]) # it seems like in-place operation happens here # Mesg_bs <CopySlices object at 0x2b410bc78c10> agg_bs = torch.mean(Mesg_bs, dim=0) # of size [K,2MN] Mesg_ue = Mesg_bs.reshape(K,M,2*M*N) for k in range(K): for m in range(M): Mesg_ue[k,m,:] = self.mlp2(F_ue[:,k], E[m,k,:], agg_bs[k,:], P[m], Noise[k]) # Mesg_ue <AsStridedBackward0 object at 0x2b410bc78c70> for k in range(K): E_cat = torch.cat([E[m, k, :] for m in range(M)], dim=0) F_ue[:,k] = self.mlp3(E_cat, F_ue[:,k], agg_bs[k,:], P, Noise[k]) # F_ue <CopySlices object at 0x2b410bc78c10> return F_ue, Mesg_ue F_ue0 = torch.rand(2 * M * N, K, requires_grad = True) # F_ue0 <ToCopyBackward0 object at 0x2b410bc78be0> Mesg_ue0 = torch.rand(K, M, 2 * M * N, requires_grad = True) # Mesg_ue <ToCopyBackward0 object at 0x2b410bc78be0> F_ue, Mesg_ue = model(F_ue0, Mesg_ue0, E, P, Noise) loss = Loss(F_ue) loss.backward()
Also, since only F_ue is used in calculating loss, I’m curious about whether the parameters of mlp1 and mlp2 will be updated or not.
Thanks for any help.