Derivation of output w.r.t input features returns 'None' after setting 'allow_unused=True'

emanuelnk · September 29, 2021, 9:46am

I am using autograd in order to get a derivative of the output w.r.t the input features, using the following code:

torch.autograd.grad(
                outputs=pred,
                inputs=input,    
                allow_unused=True, 
                retain_graph=True, 
                create_graph=True)[0])

I had to set ‘allow_unused=True’ after getting the error:

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

but then the gradients are always ‘None’.
What could be the reason for this behavior?

albanD · September 29, 2021, 2:46pm

Hi,

As the original error says: the “pred” was not computed in a differentiable way from the point of view of the autograd. Can you share the code that is used to compute pred based on input?

emanuelnk · September 29, 2021, 3:46pm

Sure, this the code for the model:

class EGNN(nn.Module):
    def __init__(self, in_node_nf, in_edge_nf, hidden_nf, device='cpu', act_fn=nn.SiLU(), n_layers=4, coords_weight=1.0, attention=False, node_attr=1):
        super(EGNN, self).__init__()
        self.hidden_nf = hidden_nf
        self.device = device
        self.n_layers = n_layers

        ### Encoder
        self.embedding = nn.Linear(in_node_nf, hidden_nf)
        self.node_attr = node_attr
        if node_attr:
            n_node_attr = in_node_nf
        else:
            n_node_attr = 0
        for i in range(0, n_layers):
            self.add_module("gcl_%d" % i, E_GCL_mask(self.hidden_nf, self.hidden_nf, self.hidden_nf, edges_in_d=in_edge_nf, nodes_attr_dim=n_node_attr, act_fn=act_fn, recurrent=True, coords_weight=coords_weight, attention=attention))

        self.node_dec = nn.Sequential(nn.Linear(self.hidden_nf, self.hidden_nf),
                                      act_fn,
                                      nn.Linear(self.hidden_nf, self.hidden_nf))

        self.graph_dec = nn.Sequential(nn.Linear(self.hidden_nf, self.hidden_nf),
                                       act_fn,
                                       nn.Linear(self.hidden_nf, 1))
        self.to(self.device)

    def forward(self, h0, x, edges, edge_attr, node_mask, edge_mask, n_nodes):
        h = self.embedding(h0)
        for i in range(0, self.n_layers):
            if self.node_attr:
                h, _, _ = self._modules["gcl_%d" % i](h, edges, x, node_mask, edge_mask, edge_attr=edge_attr, node_attr=h0, n_nodes=n_nodes)
            else:
                h, _, _ = self._modules["gcl_%d" % i](h, edges, x, node_mask, edge_mask, edge_attr=edge_attr,
                                                      node_attr=None, n_nodes=n_nodes)

        h = self.node_dec(h)
        h = h * node_mask
        h = h.view(-1, n_nodes, self.hidden_nf)
        h = torch.sum(h, dim=1)
        pred = self.graph_dec(h)
        
        return pred.squeeze(1)

then in in the training function I used the following code:

model = EGNN(in_node_nf=6, in_edge_nf=0, hidden_nf=args.nf, device=device,n_layers=args.n_layers, coords_weight=1.0, attention=args.attention, node_attr=args.node_attr)
.
.
.
pred = model(h0=nodes, x=atom_positions, edges=edges, edge_attr=None, node_mask=atom_mask, edge_mask=edge_mask, n_nodes=n_nodes)
.
.
.
atom_positions.requires_grad_(True)
grad = []
grad.append(torch.autograd.grad(
                outputs=pred,
                inputs=atom_positions,
                retain_graph=True, 
                create_graph=True)[0])
print(grad)
atom_positions.requires_grad_(False)

Thanks!

albanD · September 30, 2021, 1:24pm

The requires_grad_() call is not retro-active. You need to set the requires_grad field before you do computetation with the Tensor.
In this case, before you evaluate the model.

emanuelnk · October 18, 2021, 8:15am

Thank you!
It works now.