When I print out the grad for my parameters I get None for all values. This is preventing my model from training, hence I get a constant value for my training loss.
I’m pretty sure the .grad attributes are initialized with None (hence why you see None). Can you try sampling some inputs, computing a loss, and call loss.backward(). You’ll then see the .grad attribute populated.
If you want a function to represent your derivative of the network with respect to the parameters explicitly, have a look at the torch.func package.
Okay, I’m going over possible reasons why my model might not be learning (cos I have a training and validation loss as shown in the graph below). I’m not sure what other reason my be the cause.
I think the problem was I had many negative values and relu() was grounding them to 0. I’m currently using leakyrelu() and I get the graph below for training and validation loss. Is this a reasonable graph?
I passed my test data through it but I get really bad predicted values that are around the same range. This is my code structure for a single epoch. @ptrblck
import torch
import torch.nn as nn
class MLP(nn.Module):
def __init__(self, in_channel, out_channels):
super(MLP, self).__init__()
self.hidden = 64
self.input_dims = in_channel
self.output_dims = out_channels
self.mapping_output = 64
self.fc1 = nn.Linear(self.input_dims, self.hidden)
self.relu = nn.LeakyReLU(0.2)
self.fc2 = nn.Linear(self.hidden, self.hidden )
self.out = nn.Linear(self.hidden, self.output_dims)
self.mapping = nn.Sequential(
nn.Linear(1, self.mapping_output),
nn.LeakyReLU(0.2),
nn.Linear(self.mapping_output, self.mapping_output),
)
def forward(self, img, val):
img = torch.mean(img.view(img.shape[0], img.shape[1], -1), dim = 2)
print(img.shape)
val = self.mapping(val)
img = torch.cat([img, val], dim= 1)
fc1 = self.relu(self.fc1(img))
out = self.out(self.fc2(fc1))
return out
mlp = MLP(in_channel= 144, out_channels = 35)
img = torch.rand(1, 80, 90, 58)
val = torch.rand(1).unsqueeze(0)
target = torch.rand(35).unsqueeze(0)
optimizer = torch.optim.Adam(mlp.parameters(), lr = 1e-6)
loss = nn.GaussianNLLLoss()
var = torch.ones_like(target)
out = mlp(img, val)
l = loss(out, target, var)
optimizer.zero_grad()
l.backward()
optimizer.step()
I don’t know if you are trying to overfit a tiny sample of your dataset as an experiment or debugging step or if your actual dataset is small. Based on the loss curves I would assume the used sample size might be quite small, which could easily cause overfitting. Besides that you aren’t using an activation function between the last two linear layers which can thus be seen as a single linear mapping.
In your previous post your loss values were also negative, which seems to be unexpected. Did you figure out why that was the case as I would guess the target values might not have been in the expected range?
I was using a GaussianNLLLoss function initially which was why my loss function was negative(I assume getting a negative value is normal when using GNLLLoss). However, I have changed the loss to MSE(hence the positive values). I’ve increased my training samples and also added an activation function as you pointed out. However, I’m still getting a very low training and validation loss. They both go down after 2 epochs and remain flat.
I think you need to specify more about the dataset you’re using because maybe your model is working just fine? The loss goes down and stuff; as @ptrblck mentioned, it could be overfitting the data and will almost have sort of “learned” your data by 2 epochs if it’s not a lot.
I think your model is working just fine and the gradients are propagating backwards (because your learning rate is relatively small and it overfits); I think you might be messing up while calculating the loss, can you show the new code that you’re implementing, using MSE?
Also, you set loss = nn.MSELoss() but never use it, instead you are using mse_loss() from the Functional package, I think your loss declaration that you have never used is the one on which the .backward() is being called, and so there are no gradients maybe? anyways you should name the two differently to avoid confusion, also if your train_loss is a scalar, you should add loss.item() to it
Sorry for the confusion, I was in a rush trying to write a script of what I’m doing in my main code and I made a few mistakes. I’ve adjusted the script. In my actually code I’m backpropoing and I also use .item()
The only suggestion I have is that you don’t declare val and img as global variables, and if you do set requires_grad=True, try that, PyTorch sets the requires_grad attribute to False by default for .rand() variables.