nn.Emedding.weight has grads, but does not update (RGCNConv)


Im training models to prodict node entities in multi-relational graphs.
Therefore I want to train node embeddings with the RGCNConv layer.
For the node embeddings I use the nn.Embedding layer and these emebddings are randomly initialized. so therefore it would be nice to update those in the backprop.
In the forward I pass nn.Embedding.weight, edge_index, edge_type to the RGCNConv layer.
the nn.Embedding.weight have size(number of graph nodes, embedding dimension).
I specify requires_grad = true when initializing nn.emedding.
When I print the gradients of nn.embedding.weights I get non zero gradients of size(number of graph nodes, embedding dimension) which looks alright. When I look into the documentation of RGCNConv Id think that the embeddings would update automatically. But maybe I do not use the layer correctly.

Maybe using the nn.Embedding is unnecessary and use a matrix of shape(number of nodes, embedding dimension) instead.

any suggestions or improvements on the code are appreciated!

class Emb_Layers(nn.Module):
    def __init__(self, num_relations: int, hidden_l: int, num_labels: int, emb_dim: int, _) -> None:
        super(Emb_Layers, self).__init__()
        self.rgcn1 = RGCNConv(in_channels=emb_dim, out_channels=hidden_l, num_relations=num_relations)
        self.rgcn2 = RGCNConv(hidden_l, num_labels, num_relations)
        nn.init.kaiming_uniform_(self.rgcn1.weight, mode='fan_in')
        nn.init.kaiming_uniform_(self.rgcn2.weight, mode='fan_in')

    def forward(self, training_data: Data) -> Tensor:
        x = self.rgcn1(training_data.embedding.weight, training_data.edge_index, training_data.edge_type)
        x = F.relu(x)
        x = self.rgcn2(x, training_data.edge_index, training_data.edge_type)
        x = torch.sigmoid(x)
        return x

This is the training loop that I use:

def train(self, model: nn.Module, graph: Graph, sum_graph=True) -> Tuple[List, List]:
        model = model.to(self.device)
        training_data = graph.training_data.to(self.device)
        loss_f = torch.nn.BCELoss().to(self.device)
        optimizer = torch.optim.Adam(model.parameters(), lr=self.lr, weight_decay=self.weight_d)

        accuracies = []
        losses = []

        for epoch in range(self.epochs):
            if not sum_graph:
                acc = self.evaluate(model, training_data)
            out = model(training_data)
            targets = training_data.y_train.to(torch.float32)
            output = loss_f(out[training_data.x_train], targets)
            # print(training_data.embedding.weight[0].clone())
            # print(training_data.embedding.weight)
            # print(training_data.embedding.weight.grad[0])
            l = output.item()
            if not sum_graph:
                    print(f'Accuracy on validation set = {acc}')
            if epoch%10==0:
                print(f'Epoch: {epoch}, Loss: {l:.4f}')
        return accuracies, losses

It seems you are using the embedding.weight here:

x = self.rgcn1(training_data.embedding.weight, training_data.edge_index, training_data.edge_type)

which doesn’t seem to be a registered parameter of the module (Emb_Layers in this case), but seems to come from training_data.
In the optmizer initialization you are passing the parameters of the model:

optimizer = torch.optim.Adam(model.parameters(), lr=self.lr, weight_decay=self.weight_d)

which is generally correct, but since the embedding layer comes from training_data, the optimizer would not get them.
Could you explain your use case a bit more and especially why a trainable parameter is set in the data instead of the model?

ah thanks! I understand whats happening.
I initialize the nn.emebedding outside of the model ideed.
Im training one model on multiple smaller graphs (coming from an original graph). Each smaller graph has its own node embedding and I want to save those in the graph class to use them later on.
But it is actually not necessary to initialize the embedding in the graph class itself. I can later save/copy the nn.embedding.weight them from the model.

Alternatively, if you want to keep the current structure, you could also pass the embedding layer from training_data to the optimizer as an additional parameter.

That might also be a nice idea.
thanks a lot!