Param.grad is showing None for all parameters

When I print out the grad for my parameters I get None for all values. This is preventing my model from training, hence I get a constant value for my training loss.

class MLP(nn.Module):
    def __init__(self, in_channel, out_channels):
        super(MLP, self).__init__()
        
        self.hidden = 64
        self.input_dims = in_channel
        self.output_dims = out_channels


        self.mapping_output = 64
        self.fc1 = nn.Linear(self.input_dims, self.hidden)
        self.relu = nn.ReLU()
        self.layer1 = nn.Linear(self.hidden, self.hidden )
        self.layer2 = nn.Linear(self.hidden, self.output_dims)


        self.mapping = nn.Sequential(
        nn.Linear(1, self.mapping_output),
        nn.ReLU(),
        nn.Linear(self.mapping_output, self.mapping_output),
        )

mlp = MLP(in_channel = 80, out_channels = 10)
        
mlp.to("cpu")
for param in mlp.parameters():
    print(param.grad)

for name, param in mlp.named_parameters():
    print(name, param.grad)

Output:

None
None
None
None
None
None
None
None
None
None
None
None
None
None

Hi @ays,

I’m pretty sure the .grad attributes are initialized with None (hence why you see None). Can you try sampling some inputs, computing a loss, and call loss.backward(). You’ll then see the .grad attribute populated.

If you want a function to represent your derivative of the network with respect to the parameters explicitly, have a look at the torch.func package.

Okay, I’m going over possible reasons why my model might not be learning (cos I have a training and validation loss as shown in the graph below). I’m not sure what other reason my be the cause.

You need to include a minimal reproducible example for those graphs.

I think the problem was I had many negative values and relu() was grounding them to 0. I’m currently using leakyrelu() and I get the graph below for training and validation loss. Is this a reasonable graph?

I passed my test data through it but I get really bad predicted values that are around the same range. This is my code structure for a single epoch. @ptrblck

import torch
import torch.nn as nn
class MLP(nn.Module):
    def __init__(self, in_channel, out_channels):
        super(MLP, self).__init__()
        
        self.hidden = 64
        self.input_dims = in_channel
        self.output_dims = out_channels


        self.mapping_output = 64
        self.fc1 = nn.Linear(self.input_dims, self.hidden)
        self.relu = nn.LeakyReLU(0.2)
        
        self.fc2 = nn.Linear(self.hidden, self.hidden )
        self.out = nn.Linear(self.hidden, self.output_dims)


        self.mapping = nn.Sequential(
        nn.Linear(1, self.mapping_output),
        nn.LeakyReLU(0.2),
        nn.Linear(self.mapping_output, self.mapping_output),
        )

      
    def forward(self, img, val):
           

            img = torch.mean(img.view(img.shape[0], img.shape[1], -1), dim = 2) 
            print(img.shape)
            val = self.mapping(val)

            img = torch.cat([img, val], dim= 1) 
                 
            fc1 = self.relu(self.fc1(img))

            out = self.out(self.fc2(fc1))
           

      
            return out
    

mlp = MLP(in_channel= 144, out_channels = 35)
img = torch.rand(1, 80, 90, 58)
val = torch.rand(1).unsqueeze(0)
target = torch.rand(35).unsqueeze(0)
optimizer = torch.optim.Adam(mlp.parameters(), lr  = 1e-6)

loss = nn.GaussianNLLLoss()
var = torch.ones_like(target)

out = mlp(img, val)
l = loss(out, target, var)
optimizer.zero_grad()
l.backward()
optimizer.step()

I don’t know if you are trying to overfit a tiny sample of your dataset as an experiment or debugging step or if your actual dataset is small. Based on the loss curves I would assume the used sample size might be quite small, which could easily cause overfitting. Besides that you aren’t using an activation function between the last two linear layers which can thus be seen as a single linear mapping.
In your previous post your loss values were also negative, which seems to be unexpected. Did you figure out why that was the case as I would guess the target values might not have been in the expected range?

I was using a GaussianNLLLoss function initially which was why my loss function was negative(I assume getting a negative value is normal when using GNLLLoss). However, I have changed the loss to MSE(hence the positive values). I’ve increased my training samples and also added an activation function as you pointed out. However, I’m still getting a very low training and validation loss. They both go down after 2 epochs and remain flat.

I think you need to specify more about the dataset you’re using because maybe your model is working just fine? The loss goes down and stuff; as @ptrblck mentioned, it could be overfitting the data and will almost have sort of “learned” your data by 2 epochs if it’s not a lot.

I’m using MRI images(I checked that its well normalised) . Also, I’ve added more training data to the dataset.

I think your model is working just fine and the gradients are propagating backwards (because your learning rate is relatively small and it overfits); I think you might be messing up while calculating the loss, can you show the new code that you’re implementing, using MSE?

Ive changed the learning rate to 1e-4 , below is the updated version.

import torch
import torch.nn as nn
class MLP(nn.Module):
    def __init__(self, in_channel, out_channels):
        super(MLP, self).__init__()
        
        self.hidden = 64
        self.input_dims = in_channel
        self.output_dims = out_channels


        self.mapping_output = 64
        self.fc1 = nn.Linear(self.input_dims, self.hidden)
        self.relu = nn.LeakyReLU(0.2)
        
        self.fc2 = nn.Linear(self.hidden, self.hidden )
        self.out = nn.Linear(self.hidden, self.output_dims)


        self.mapping = nn.Sequential(
        nn.Linear(1, self.mapping_output),
        nn.LeakyReLU(0.2),
        nn.Linear(self.mapping_output, self.mapping_output),
        )

      
    def forward(self, img, val):
           

            img = torch.mean(img.view(img.shape[0], img.shape[1], -1), dim = 2) 
            print(img.shape)
            val = self.mapping(val)

            img = torch.cat([img, val], dim= 1) 
                 
            fc1 = self.relu(self.fc1(img))

            out = self.out(self.relu(self.fc2(fc1)))
           

      
            return out
    

mlp = MLP(in_channel= 144, out_channels = 35)
img = torch.rand(1, 80, 90, 58)
val = torch.rand(1).unsqueeze(0)
target = torch.rand(35).unsqueeze(0)
optimizer = torch.optim.Adam(mlp.parameters(), lr  = 1e-4)
epochs = 500
mse_loss = nn.MSELoss()

for epoch in epochs:
    train_loss = val_loss = 0
    mlp.train()
    for batch, data in enumerate(train_data):
        x, y = data[0].to(device).float(), data[1].to(device).float()
       
        #some mathematical computation to obtain val
        val = pretrained_model(x)

        predicted = mlp(x, val)
        
        
        loss = mse_loss(predicted, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        train_loss += loss
    
    mlp.eval()
    for batch, data in enumerate(val_data):
        x, y = data[0].to(device).float(), data[1].to(device).float()
       
        #some mathematical computation to obtain val
        val = pretrained_model(x)

        predicted = mlp(x, val)
        
        
        loss = mse_loss(predicted, y)
        
        val_loss += loss

    log({"training loss": train_loss/len(train_data), "validation loss":  val_loss/len(val_data)})


    

Is there a way you can implement your code without using the .float() method? that has caused issues with my code, try removing that and running

Also, you set loss = nn.MSELoss() but never use it, instead you are using mse_loss() from the Functional package, I think your loss declaration that you have never used is the one on which the .backward() is being called, and so there are no gradients maybe? anyways you should name the two differently to avoid confusion, also if your train_loss is a scalar, you should add loss.item() to it

Sorry for the confusion, I was in a rush trying to write a script of what I’m doing in my main code and I made a few mistakes. I’ve adjusted the script. In my actually code I’m backpropoing and I also use .item()

What results do you obtain? I don’t think I see any other bugs.

The loss is still acting the same same. it starts off very small (0.014) and remains around that range after 2 epochs (0.009).

The only suggestion I have is that you don’t declare val and img as global variables, and if you do set requires_grad=True, try that, PyTorch sets the requires_grad attribute to False by default for .rand() variables.