Try to update the some variables while guarantee they can be autograded properly

Hi,
I’m trying to modify self.message1 and self.message2 using two different mlps. But I find that the gradients of these two mlps are None for each epoch all the time. But I use the same way to modify the variable F_ue_update using mlp3, the gradients of its weights and bias are not None.
Here is some parts of my code.
Thanks for any help.

    def message(self, F_ue, E, agg_ue, P, Noise): 
        if self.device.type == "cuda":
            P = P.to(self.device)
            Noise = Noise.to(self.device)
        for m in range(self.M):
            for k in range(self.K):
                message_input1 = torch.cat((E[m,k,:], agg_ue[m,:], P[m], Noise[k]), 0)
                message_input_var1 = torch.autograd.Variable(message_input1, requires_grad=True)
                self.message_bs[m, k, :] = self.mlp1(message_input_var1)

        agg_bs = torch.mean(self.message_bs, dim=0)
        agg_bs = agg_bs.to(self.device) # of size [K, 2MN]
        for k in range(self.K):
            for m in range(self.M):
                message_input = torch.cat((F_ue[:, k], E[m, k, :], agg_bs[k, :], P[m], Noise[k]), 0)
                message_input_var = torch.autograd.Variable(message_input, requires_grad=True)
                self.message_ue[k, m, :] = self.mlp2(message_input_var)

def update(self, E, F_ue, agg_bs, P, Noise):
        if self.device.type == "cuda":
            P = P.to(self.device)
            Noise = Noise.to(self.device)
        F_ue_update = torch.zeros(F_ue.shape)
        for k in range(self.K):
            edge_cat_ue = torch.cat([E[m, k, :] for m in range(self.M)], dim=0)
            message_input3 = torch.cat((edge_cat_ue, F_ue[:, k], agg_bs[k,:], P[0], Noise[k]), 0)
            message_input_var3 = torch.autograd.Variable(message_input3, requires_grad=True)
            F_ue_update[:, k] = self.mlp3(message_input_var3)
            
        F_ue_update = F_ue_update.to(self.device)
F_ue = model(F_ue0, edge_feature, P, Noise)
W = getW(F_ue, P, K)
loss = loss + Loss(W, H_complex, Noise)
optimizer.zero_grad()
loss.backward()
for name, param in model.named_parameters():    
     if param.grad is not None:
        print(f'Parameter: {name}')    # Only mlp3 weight and bias be printed
optimizer.step()

Is this a double post from here or could you explain what the difference is?
You are still recreating the deprecated Variables and are thus detaching the tensor from the computation graph. Don’t use deprecated objects and use differentiable operations (such as torch.cat or torch.stack) instead of recreating tensors as already explained.

I modified message function compared to the previous post, the rest of them are the same.
I’m sorry but I still don’t understand what you mean by using deprecated objects. Do you mean in the getW function that I cannot create a new variable W? But if I don’t create a new complex variable, how can I transfer the real outputs from mlps into complex to compute for the loss (as I mentioned in the previous post)?

For this post, I don’t understand why the parameters of only mlp1 and mlp2 don’t have gradients. You mentioned that I used .detach() so it will be detached from its original graph. So I modified the message and update functions to avoid using .clone().detach(), now the parameters of mlp3 can be updated properly but not for mlp1 and mlp2. I don’t understand why because all mlps are used in the same way.

Sorry for my poor explaintion please let me know if anything is misunderstanding. I really appreciate your help.

Variable is deprecated since PyTorch 0.4 so don’t use it.

Yes, recreating a new tensor from another one will detach it from the computation graph as seen here:

x = torch.randn(1, requires_grad=True)
y = x + 1
print(y.grad_fn)
# <AddBackward0 object at 0x0000025A0007C370>
# a valid grad_fn indicates y is attached to a computation graph

z = torch.tensor(y)
# UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
print(z.grad_fn)
# None
print(z.is_leaf)
# True
# Note the Userwarning! You can see that z is a leaf tensor without a gradient history!

Instead create the output via differentiable operations. E.g. if you want to concatenate different tensors into one use torch.cat or torch.stack.

Thanks for your example. I still have several questions.

  • Do you have any suggestion about fix the getW function? I really need to transfer the real outputs into complex.

  • Also, for the message function, I think I only modify the variables values in each iteration instead of creating new tensors to store the result by calculating mlp1 and mlp2, but it turns out that they cannot be properly autograded.