Hi folks, I’m quite new to Pytorch and am trying to learn a model on top of graph representations. So I have a model that takes in my data, uses a graph neural network to learn representations, then concats the representations (node pairs) and passes these through a linear layer. However, it looks like when i do loss.backward, the layers aren’t being updated at all. I’m wondering if I’m doing anything wrong here? Here’s a short code snippet for clarification in my model’s forward function.

x = self.gcn1.forward(graph_feats, edge_index=edge_index, edge_weight=edge_weight)
x = F.relu(x)
x = self.gcn2(x, edge_index=edge_index, edge_weight=edge_weight)
x = F.relu(x)
x = self.gcn3(x, edge_index=edge_index, edge_weight=edge_weight)
# Take the d p vectors, and concat them, followed by putting classifier on top
stop_num = int(edge_index.shape[1]/2)
node_idx = edge_index.T[0:stop_num]
d_rep = x[node_idx[:,0]]
p_rep = x[node_idx[:,1]]
edge_weight_half = edge_weight[0:stop_num].type(torch.FloatTensor).view(-1,1).cuda()
d_p_pair = torch.cat((d_rep, p_rep, edge_weight_half), dim=1)
x = self.lin(d_p_pair)
y = self.act_lin(x)

hey! thanks for responding. I thought so too, but it seems like the gradients aren’t flowing backwards. When i check the model.parameters(), i see all the layers that are present. is there a way to retrieve the weight matrices from each layer and observe them as they are trained?

It also looks like subsetting the features in x is causing this problem. If I don’t do that, the gradients appear to be able to flow. This step appears to be causing the problem and I have no idea why.

# Take the d p vectors, and concat them, followed by putting classifier on top
stop_num = int(edge_index.shape[1]/2)
node_idx = edge_index.T[0:stop_num]
d_rep = x[node_idx[:,0]]
p_rep = x[node_idx[:,1]]
edge_weight_half = edge_weight[0:stop_num].type(torch.FloatTensor).view(-1,1).cuda()
d_p_pair = torch.cat((d_rep, p_rep, edge_weight_half), dim=1)

I think I found the issue. After tinkering around, I realized the nn.Sigmoid layer was the problem. It seems like the BCELoss was the issue. If I swapped to the BCEWithLogitsLoss, it seems like the loss behaves and the weight matrices actually change now! I’m not really sure why BCELoss causes this issue though, and whether it’s the numerical instability that Pytorch indicates. Nevertheless, I think it solves the mystery? Any idea how to tell if it’s due to the numerical instability or when we need to use BCEWithLogitsLoss over BCELoss?

As mentioned in the doc the difference between these two functions is that one will take the log of the input but not the other.
So it will lead to surprising values if you give exponentiated values to BCELoss which does not expect them (values will be way too large). And potentially puts you in a region of the loss function where the gradient is 0.