No gradients flow for Custom Loss Function

averma · July 2, 2018, 9:48pm

I am using a custom loss function in which I have defined using Class(nn.Module). I have written only the forward() in my Loss function as I am using all the tensors and torch operations. But After loss.backward when i try to print the parameters gradients, I am getting no gradients. This is my loss function

import pdb
import torch

class my_Loss(torch.nn.Module):
    
    def __init__(self):
       super(my_Loss,self).__init__()
       #self.X = X
       #self.Y = Y
       
    def forward(self,Y1,X1):
      
      cuda = torch.device("cuda:1")
      tensor = torch.tensor((), dtype=torch.float64,device=cuda)
      #F = tensor.new_zeros((Y1.size(0),1),requires_grad=True)
      N = Y1.size(0)
      for i in range (N): 

         #Z = (Y/(2*torch.max(X,1-X)))+((torch.max(X,1-X)-X)/(2*torch.max(X,1-X)))
         Y = Y1.clone()[i,0]
         X = X1.clone()[i,0]
         X_comp = 1-X
         Z_num1 = torch.max(X,X_comp)
         Z_den = 2*torch.max(X,X_comp)
         Z_1 = Y/Z_den
         Z_2 = (Z_num1-X)/Z_den
         Z = Z_1+Z_2
         Z_comp = 1-Z
         #F = X*(Z*torch.log(Z)+(1-Z)*torch.log(1-Z))
         F = -X*(Z*torch.log(Z)+Z_comp*torch.log(Z_comp))
      loss = (1-(torch.sum(F)/N))
      #pdb.set_trace()
      return loss

Now have some questions

Do I need to define backward() also?
2)If not then why all the gradients are ‘None’ type?
How to define backward() for this loss function?

ptrblck · July 2, 2018, 10:11pm

Your current implementation of your loss function does not have any Parameters, so it’s basically just a function.
Which gradients are you trying to call?

averma · July 2, 2018, 10:30pm

Actually Y1 is the output of my network and X1 is ground truth. So during training I am calculating this loss by calling this function and then when I am calling loss.backward() and trying to print gradients of parameters of my network then all the gradients are None. So I couldn’t find where the problem is?

ptrblck · July 2, 2018, 10:37pm

Thanks for the info.
Your code seems to work.

def my_loss_fn(output, target):
    N = output.size(0)
    for i in range (N): 
        
        Y = output.clone()[i,0]
        X = target.clone()[i,0]
        X_comp = 1-X
        Z_num1 = torch.max(X,X_comp)
        Z_den = 2*torch.max(X,X_comp)
        Z_1 = Y/Z_den
        Z_2 = (Z_num1-X)/Z_den
        Z = Z_1+Z_2
        Z_comp = 1-Z
        
        F = -X*(Z*torch.log(Z)+Z_comp*torch.log(Z_comp))
    loss = (1-(torch.sum(F)/N))
     
    return loss


model = nn.Sequential(
    nn.Linear(20, 10),
    nn.ReLU(),
    nn.Linear(10, 2),
    nn.Softmax(dim=1)
)

x = torch.randn(1, 20)
target = torch.randn(1, 20)
output = model(x)
loss = my_loss_fn(output, target)
loss.backward()
print(model[0].weight.grad)
> tensor([[ 0.0002,  0.0013,  0.0000, -0.0011,  0.0003, -0.0004,  0.0013,  0.0010,
         -0.0010, -0.0010, -0.0004, -0.0016, -0.0008,  0.0005,  0.0008,  0.0021,
          0.0006,  0.0019,  0.0005, -0.0015],...

I’m not sure, why you are iterating the batch dim and just use the last F.
Also, I don’t think you need to clone the output and target.
Does this code work for you?

averma · July 2, 2018, 10:54pm

Thanks ptrblck.
Actually In was confused about the error. Let me explain you. In training phase I am giving the output of my network to a function lets say f1() which the tensor as input and giving a scalar output and then output of f1() is going into this loss function. So it is some what like

 output = net(input)
 out1 = f1(output)
 Loss = my_Loss(out1,X)
 Loss.backward()

And then all the gradients of parameters are None. So may be the problem was at function f1(). That I will attach later.