No gradients flow for Custom Loss Function

I am using a custom loss function in which I have defined using Class(nn.Module). I have written only the forward() in my Loss function as I am using all the tensors and torch operations. But After loss.backward when i try to print the parameters gradients, I am getting no gradients. This is my loss function

import pdb
import torch

class my_Loss(torch.nn.Module):
    def __init__(self):
       #self.X = X
       #self.Y = Y
    def forward(self,Y1,X1):
      cuda = torch.device("cuda:1")
      tensor = torch.tensor((), dtype=torch.float64,device=cuda)
      #F = tensor.new_zeros((Y1.size(0),1),requires_grad=True)
      N = Y1.size(0)
      for i in range (N): 

         #Z = (Y/(2*torch.max(X,1-X)))+((torch.max(X,1-X)-X)/(2*torch.max(X,1-X)))
         Y = Y1.clone()[i,0]
         X = X1.clone()[i,0]
         X_comp = 1-X
         Z_num1 = torch.max(X,X_comp)
         Z_den = 2*torch.max(X,X_comp)
         Z_1 = Y/Z_den
         Z_2 = (Z_num1-X)/Z_den
         Z = Z_1+Z_2
         Z_comp = 1-Z
         #F = X*(Z*torch.log(Z)+(1-Z)*torch.log(1-Z))
         F = -X*(Z*torch.log(Z)+Z_comp*torch.log(Z_comp))
      loss = (1-(torch.sum(F)/N))
      return loss

Now have some questions

  1. Do I need to define backward() also?
    2)If not then why all the gradients are ‘None’ type?
  2. How to define backward() for this loss function?

Your current implementation of your loss function does not have any Parameters, so it’s basically just a function.
Which gradients are you trying to call?

Actually Y1 is the output of my network and X1 is ground truth. So during training I am calculating this loss by calling this function and then when I am calling loss.backward() and trying to print gradients of parameters of my network then all the gradients are None. So I couldn’t find where the problem is?

Thanks for the info.
Your code seems to work.

def my_loss_fn(output, target):
    N = output.size(0)
    for i in range (N): 
        Y = output.clone()[i,0]
        X = target.clone()[i,0]
        X_comp = 1-X
        Z_num1 = torch.max(X,X_comp)
        Z_den = 2*torch.max(X,X_comp)
        Z_1 = Y/Z_den
        Z_2 = (Z_num1-X)/Z_den
        Z = Z_1+Z_2
        Z_comp = 1-Z
        F = -X*(Z*torch.log(Z)+Z_comp*torch.log(Z_comp))
    loss = (1-(torch.sum(F)/N))
    return loss

model = nn.Sequential(
    nn.Linear(20, 10),
    nn.Linear(10, 2),

x = torch.randn(1, 20)
target = torch.randn(1, 20)
output = model(x)
loss = my_loss_fn(output, target)
> tensor([[ 0.0002,  0.0013,  0.0000, -0.0011,  0.0003, -0.0004,  0.0013,  0.0010,
         -0.0010, -0.0010, -0.0004, -0.0016, -0.0008,  0.0005,  0.0008,  0.0021,
          0.0006,  0.0019,  0.0005, -0.0015],...

I’m not sure, why you are iterating the batch dim and just use the last F.
Also, I don’t think you need to clone the output and target.
Does this code work for you?

1 Like

Thanks ptrblck.
Actually In was confused about the error. Let me explain you. In training phase I am giving the output of my network to a function lets say f1() which the tensor as input and giving a scalar output and then output of f1() is going into this loss function. So it is some what like

 output = net(input)
 out1 = f1(output)
 Loss = my_Loss(out1,X)

And then all the gradients of parameters are None. So may be the problem was at function f1(). That I will attach later.