PyTorch Geometric custom layer parameters not updating

I am developing a graph neural network using PyTorch Geometric. The idea is to start with multivariate time series, build a graph based on the correlation between those time series and then classify the graph. I have built a CorrelationLayer that computes the adjacency matrix of the graph using the pearson coefficient, and multiplies it for a matrix of trainable weights. This matrix is then passed, along with the time series as node features, to a graph convolution layer (i will add other layers for classifications after the graph convolution but i made a super-simplified version for this question). The problem is that when i try to train the model the weigths of the correlation layer do not update, while the parameters of the graph convolution layer do without any problem)

Here is the code for the correlation layer:

class CorrelationLayer(nn.Module):

    def __init__(self, num_time_series):
      super().__init__()
      self.num_time_series = num_time_series
      self.weights = nn.Parameter(torch.rand((num_time_series, num_time_series)))

    def forward(self, x): 
      correlations = torch.zeros((x.shape[0], x.shape[0]))
      for i in range(x.shape[0]):
        for j in range(i+1, x.shape[0]):
          c, _ = pearsonr(x[i], x[j])
          correlations[i, j] = c
          correlations[j, i] = c
      correlations = correlations * self.weights
      return correlations

And here is the code for the GCN model:

class GCN(nn.Module):

    def __init__(self, num_time_series, ts_length, hidden_channels):
      super(GCN, self).__init__()
      self.corr_layer = CorrelationLayer(num_time_series)
      self.graph_conv = GCNConv(ts_length, hidden_channels)
      return

  def forward(self, x):
      adj = self.corr_layer(x)
      out = self.graph_conv(x, torch_geometric.utils.dense_to_sparse(adj)[0])
      return out

This is the code that i wrote in order to train and test the model, with some sample data:

def train(model, X_train, Y_train):
    model.train()
    for x, y in zip(X_train,Y_train):
        out = model(x)
        print(model.corr_layer.weights)
        print(model.graph_conv.state_dict().values())
        loss = criterion(out, y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()


X =  torch.tensor([
    [
        [0.,1.,2.,3.],
        [1.,2.,3.,4.],
        [0.,6.,3.,1.],
        [3.,2.,1.,0.]
    ],
    [
        [2.,4.,6.,8.],
        [1.,2.,3.,4.],
        [1.,8.,3.,7.],
        [3.,2.,1.,0.]
    ],
    [
        [0.,1.,2.,3.],
        [1.,2.,3.,4.],
        [0.,6.,3.,1.],
        [3.,2.,1.,0.]
    ]
])

Y = torch.tensor([
    [[1.],[1.],[1.],[1.]],
    [[0.],[0.],[0.],[0.]],
    [[1.],[1.],[1.],[1.]]
])

model = GCN(4,4,1)

optimizer = torch.optim.Adam(model.parameters(), lr=0.5)
criterion = torch.nn.MSELoss()

for epoch in range(1, 100):
    train(model, X,Y)

With the prints in the train function we can see that the parameters of the graph_conv layer are updating, while the weights of the correlation layer not.

At the moment my guess is that the problem is in the transition from the adjacency matrix to the sparse version with dense_to_sparse but I am not sure.

Has anyone experienced something similar and have any ideas or suggestions?

dense_to_sparse does not break the gradient.
corr_layer not updated at all?

First of all, thank you for the clarification about dense_to_sparse not breaking the gradient.
But yes, corr_layer is not updating at all, I also tried with much higher learning rates.
If instead of GCNConv layer, I use a linear layer like Linear(16, 4) and then in the forward method I do out = self.linear(torch.flatten(adj)), in this way the weights update without any problems.
it is worth noting that if I try to print the gradient of the correlation layer’s weights I get None

dense_to_sparse contains two tensors that first one is a set of indices of elements and the second one is the value tensor.

The index tensor does not contains the gradient where the value tensor has it.

In order to do what you want to do,

out = self.graph_conv(x, torch_geometric.utils.dense_to_sparse(adj)[0], torch_geometric.utils.dense_to_sparse(adj)[1])

should work

1 Like

Yes, adding the value tensor works perfectly (marked your answer as the solution).

In the meantime, I tried using the DenseGCNConv layer which takes as input directly the adjacency matrix. In this way, the weights are updating, but in terms of the end result, your solution still seems to work much better.