Are the parameters of my model (torch.nn.Module) specified correcltly?

Saguaro · August 28, 2021, 1:04pm

I would like to minimize the following cost function

class SelfSupervLoss(nn.Module):
    def __init__(self):
        super(SelfSupervLoss, self).__init__()
        
    def forward(self, x, edge_index):
        src, dest = edge_index
        return (x[src] * x[dest]).mean()

criterion = SelfSupervLoss()

and I want to use SGD for this purpose

from torch.optim import SGD

optimizer = SGD(model.parameters(), lr = 0.01)

The only parameters of my model are:

self.hv = torch.nn.Parameter(data.x[data.edge_index[0]]) # argument is a Tensor with the shape [472, 6]
self.hu = torch.nn.Parameter(data.x[data.edge_index[1]]) # argument is a Tensor with the shape [472, 6]

which I declare in the subclass of torch.nn.Module (If needed I can share the whole model)

Training loop is given as follows:

def train():
    optimizer.zero_grad()
    out = model(data.x, data.edge_index, deg)
    loss = criterion(out, data.edge_index)
    loss.backward(retain_graph = True)
    optimizer.step()
    return loss

But the loss remains on the same level over 200+ epochs

 Epoch: 0, Loss: 0.1043
 Epoch: 1, Loss: 0.1087
 Epoch: 2, Loss: 0.0914
 Epoch: 3, Loss: 0.1007
 Epoch: 4, Loss: 0.0994
...

I do not understand what I can be doing wrong. Maybe the declaration of the parameters? I am not sure… Does anybody know?
Thank you!

AlphaBetaGamma96 · August 28, 2021, 2:12pm

Could you post what your model is?

Also, during the training loop print out what the gradients are! They might be really small (where you’d need to just increase your learning rate) or might be equal to near 0 which would indicate a problem within your model! You can print these out by adding this to your train function.

for name, param in model.named_parameters():
  print(name, param.requires_grad, param.grad)

gphilip · August 28, 2021, 7:59pm

Try reducing your learning rate by different orders of magnitude. The loss may be “jumping around” a valley because the learning rate is too large. Try lr=1e-3, lr=1e-4, and so on; also try intermediate values such as lr=3e-5. Play around with the learning rate to see if you can hit a sweet spot.

If you see improvements in the loss when the learning rate is reduced, but you get tired of doing this by hand, you can also think of using learning rate schedulers of which PyTorch has many.

Saguaro · August 28, 2021, 10:05pm

Thank you for the answer! I have had to completely rebuild my model and now it seems to perform better. But thanks again for your advice!