Forward method call for weight training

Asgard · June 5, 2020, 4:04pm

My PyTorch method isn’t automatically calling the forward method.

I’m trying to embed my graph adjacency matrix by aggregating neighbours and combining them (similar to GraphSAGE)

An adjacency matrix is of size nXn and the embedding will be of size nXd where d<n.

So, basically in my code, an adjacency matrix of a graph is fed as input for the purpose of embedding and the forward method should return the embedding.


import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

class Setting(nn.Module):
    
    def __init__(self, A):
        
        super(Setting, self).__init__()
        
        self.A = A
        self.X = np.array(np.sum(A, axis=1))
        self.feature_len = 10

        self.L = 3
        self.n = 10
        self.z_dim = 8
        
        self.h = torch.empty(self.L-1,self.n, self.z_dim)
        
        self.h0 = torch.from_numpy(self.X)
        
        W0 = nn.init.xavier_uniform_(torch.empty(self.feature_len, self.z_dim))
        
        self.h[0] = torch.empty(self.n, self.z_dim)

        for v in range(self.n):
            self.h[0][v] = F.relu(self.h0[v]*W0)
        
    def forward(self, A):
        
        d_u = np.empty([self.n, self.n])
        
        for v in range(self.n):
            d_u[v] = self.X[v]*self.A[v]

        d_u = torch.from_numpy(d_u)        
                
        h_n = torch.empty(self.L-1, self.n, self.z_dim)

        rnn = nn.GRUCell(self.z_dim, self.z_dim)

        H = F.normalize(self.h[0], p=2, dim=1)
        
        for l in range(0,self.L-1):
            
            h_n[l] = torch.mm(d_u, self.h[l-1].double())
            self.h[l] = rnn(self.h[l-1], h_n[l]).detach().float()
            self.h[l] = F.normalize(self.h[l], p=2, dim=1)
                        
            H = torch.max(H, self.h[l])
   
        return H

Now based on the value of H, I want to train the weight W0, by taking MSELoss of H and H1 (another already known embedding).

net = Setting(A).double()
loss = nn.MSELoss()
optimizer = torch.optim.Adam(Setting.parameters(), lr=0.1)

for epoch in range(10):
    H = net(A)
    loss_calc = loss(H, H1)
    loss_calc.backward()
    optimizer.zero_grad()

print(H)

How to train the weight W0 based on this architecture? Any modifications in the code are highly appreciated.

Thanks in advance.

ptrblck · June 6, 2020, 8:35am

To make W0 trainable, you would have to define it as an nn.Parameter.
Also, it should be used in the computation of the output of your model.
Currently it seems you are only using W0 in your __init__ method to calculate self.h, which is then overwritten in the forward method, so I’m not sure how the computation graph should look like.

Asgard · June 7, 2020, 11:43am

So, changing the weight to nn.Parameter should work, right?

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

class Setting(nn.Module):
    
    def __init__(self, A):
        
        super(Setting, self).__init__()
        
        self.A = A
        self.X = np.array(np.sum(A, axis=1))
        self.feature_len = 10

        self.L = 4
        self.n = 10
        self.z_dim = 8
        
        self.h = torch.empty(self.L-1,self.n, self.z_dim)
        
        self.h0 = torch.from_numpy(self.X)
        
        W0 = nn.parameter.Parameter(torch.FloatTensor((self.feature_len, self.z_dim)))
        
        self.h[0] = torch.empty(self.n, self.z_dim)

        for v in range(self.n):
            self.h[0][v] = F.relu(self.h0[v]*W0)
        
    def forward(self, A):
        
        d_u = np.empty([self.n, self.n])
        
        for v in range(self.n):
            d_u[v] = self.X[v]*self.A[v]

        d_u = torch.from_numpy(d_u)        
                
        h_n = torch.empty(self.L-1, self.n, self.z_dim)

        rnn = nn.GRUCell(self.z_dim, self.z_dim)

        H = F.normalize(self.h[0], p=2, dim=1)
        
        for l in range(1,self.L-1):
            
            h_n[l] = torch.mm(d_u, self.h[l-1].double())
            self.h[l] = rnn(self.h[l-1], h_n[l]).detach().float()
            self.h[l] = F.normalize(self.h[l], p=2, dim=1)
            H = torch.max(H, self.h[l])
   
        return H

So, I’ve updated the weight as Parameter, but I’m unclear on how to add W0 in the forward method. Also, note that the limits of the for loop are changed so that there is no overwriting.

So, my question is as follows:
I first calculate the h[0] using W and h0 (this is different from h[0]). Now further, I’ll use these values to calculate h[1], h[2], …, h[L-1]. Max value from each of this is updated is updated into H with

H = torch.max(H, self.h[l])

This returned value H is compared with H1 (another already known embedding).

So, to make H and H1 embeddings equal, weight W0 is basic the trainable parameter. How to do that and what are the modifications required in the code?

Thanks a lot for your time and help.

ptrblck · June 8, 2020, 5:04am

I’m still unsure how the code should work.
You are currently creating W0 with two values. Is this your use case or would you like to create a parameter with the shapes [self.feature_len, self.z_dim]? In the latter case you would need to use a factory method such as torch.zeros, torch.randn etc.

You should also register the parameter via self.W0 = nn.Parameter(...), so that it’ll be returned in model.parameters(), which is the usual way to forward the parameters to an optimizer.

Once the code is running, you should check, if model.W0.grad gives you a valid gradient, as I’m currently unsure if the reassignments detach the computation graph.

Asgard · June 8, 2020, 5:24am

This is how the working of the code is to be:

An adjacency matrix A of size nXn is passed into the constructor and X is the degree matrix of A (of size nX1)
Also, the values L, n and z_dim are initialized and tensor h0 copies the value from X. Also, another tensor h (not h0; sorry for the similar confusing names) of size (L-1)X n X z_dim is initialized in

for v in range(self.n):
            self.h[0][v] = F.relu(self.h0[v]*W0)

The trainable parameter is W0 which is to be trained is of the dimension 10X8
Now, in the forward method, all the values of h are set here:

for l in range(1,self.L-1):
            
            h_n[l] = torch.mm(d_u, self.h[l-1].double())
            self.h[l] = rnn(self.h[l-1], h_n[l]).detach().float()
            self.h[l] = F.normalize(self.h[l], p=2, dim=1)
                        
            H = torch.max(H, self.h[l])

Note that when l=1, h[l-1] = h[0] which is set as:

for v in range(self.n):
            self.h[0][v] = F.relu(self.h0[v]*W0)

So ultimately, all values of h and H tensor are dependent on W0 weight which has to be trained. The forward method will return H tensor which will be compared with H1 (another pre-defined tensor) and difference between the two is to be minimised.

Please ask if there’s any clarification needed. Thanks for your time and help.