Custom loss functions

You can pass the parameter in a list to it:

param = nn.Parameter(torch.randn(1))
model = MyModel()
optimizer = torch.optim.SGD(list(model.parameters()) + [param], lr=1e-3)
1 Like

I see :blue_heart: you are so nice !

Hi @ptrblck,

I’m trying to make a custom loss function but am getting “RuntimeError: grad can be implicitly created only for scalar outputs” on the backward pass.

def dir_loss(output, target, data_fin):
    delts = -torch.mul((output - data_fin), (target - data_fin))
    return F.relu(delts)

I want the loss to be zero when negative and some value when not. Any suggestions?

The error is raised if you call .backward() on a tensor with more than a single element and you would then either have to specify the gradient manually or reduce the tensor before as seen here:

loss = torch.randn(2, 2, requires_grad=True)
loss.backward()
> RuntimeError: grad can be implicitly created only for scalar outputs

# reduce
loss.mean().backward() # works

# manual gradient
loss.backward(torch.ones_like(loss)) # works
1 Like

Hi again @ptrblck,

I’ve figured I might be better off using an if statement as follows:

def dir_loss(outputs,targets,fin_val):
    loss=torch.mean(torch.where((outputs-fin_val)*(targets-fin_val)<0,10*(outputs-targets)**2,0.01*(outputs-targets)**2))
    return loss

I’m trying to get the loss to penalize the network when the direction is not the same and reward when they are, similar to this paper: Exploring the Impact of Magnitude - and Direction-based Loss Function on the Profitability using Predicted Prices from Deep Learning by Chihcheng Hsu, Lichen Tai :: SSRN. However, this seems to have no effect so far. What is the best way to make the loss function reward the same direction?

Nevermind, I found a work around by modifying the targets based on the if statement before connecting them to the graph.

Hi @ptrblck

I am trying to use PyTorch to estimate count regression model parameters. I have created a small GitHub repository documenting the project.

I am struggling to update parameters of certain classes of regression models (Poisson Inverse Gaussian, Sichel and Delaport). The loss functions (log probability density functions) in either Numpy or PyTorch seem to be specified correctly (and agree with previous R, C, etc. implementations). However, when I try to perform NLL.backward() step in to calculate gradients (using PyTorch/AutoGrad) I get an error - which I cannot seem to debug?

For example: the Poisson Inverse Gaussian loss (negative log likelihood) is given below:

def pig_nll(x, mu, sigma): 
    ## Determine length of data vector and parameters 
    ly = int(torch.max(torch.Tensor([len(x), len(mu), len(sigma)])).item())
    #x = np.repeat(a=x, repeats=ly)      
    nsigma = sigma.repeat(ly)
    nmu = mu.repeat(ly)
    ## Initial vectors to store computed PIG density values
    ny = int(len(x))
    maxyp1 = x.max().item() + 1
    tofY = torch.zeros(int(maxyp1))
    sumlty = torch.zeros(ly)
    ## Big for loop to compute PIG density (or log-density)
    ## This is directly from Rigby et al: tofyPIG2.c code.
    for i in torch.arange(1, ny+1, dtype=torch.int32):
        iy = x[i.item()-1] + 1
        tofY[0] = nmu[i.item()-1] * ((1 + 2*nsigma[i.item()-1]*nmu[i.item()-1])**(-0.5))
        sumT = torch.Tensor([0]) 
        ## Start inner loop to compute rest of PIG density
        if (x[i.item()-1]==0):
            sumT = torch.Tensor([0])
        else:
            for j in torch.arange(1, iy, dtype=torch.int32):
                tofY[j.item()] = ((nsigma[i.item()-1] * ((2*(j.item())-1)/nmu[i.item()-1])) + (1/tofY[j.item()-1])) * ((tofY[0])**2)
                sumT = sumT + torch.log(tofY[j.item()-1])
        sumlty[i.item()-1] = sumT
    ## Add the kernel of the PIG density back to other constant component
    logfy = -torch.lgamma(x+1) + (1 - torch.sqrt(1 + 2*sigma*mu))/sigma + sumlty
    ## Return neg log lik to user
    nll = -torch.sum(logfy)
    return nll 

My PyTorch training loop looks as follows:

## Instantiate data tensor, and variable for (binomial) model parameters
x = torch.autograd.Variable(torch.from_numpy(dat.fish.to_numpy())).type(torch.FloatTensor)
l_mu = torch.autograd.Variable(torch.rand(1), requires_grad=True)
l_sigma = torch.autograd.Variable(torch.rand(1), requires_grad=True)


# torch.autograd.set_detect_anomaly(True)

## Learning rate
learning_rate_mu = 2e-5
learning_rate_sigma = 2e-5

## Training loop
for t in range(25000):
    ## Backprop on negative log likelihood loss
    NLLpig = pig_nll(x=x, mu=l_mu, sigma=l_sigma) 
    NLLpig.backward()
    ## Logging to console
    if t % 1000 == 0:
        print("Iteration = ", t, 
              "loglik  =", NLLpig.data.numpy(), 
              "lmu =", l_mu.data.numpy(), 
              "lsigma =", l_sigma.data.numpy(),  
              "dL/dlmu = ", l_mu.grad.data.numpy(), 
              "dL/dlsigma = ", l_sigma.grad.data.numpy())
    ## SGD update of parms
    l_mu.data -= learning_rate_mu * l_mu.grad.data
    l_sigma.data -= learning_rate_sigma * l_sigma.grad.data
    ## Zero the gradients
    l_mu.grad.data.zero_()
    l_sigma.grad.data.zero_()

And the error I get is below:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-15d640d81abc> in <module>
      9     ## Backprop on negative log likelihood loss
     10     NLLpig = pig_nll(x=x, mu=l_mu, sigma=l_sigma)
---> 11     NLLpig.backward()
     12     ## Logging to console
     13     if t % 1000 == 0:

~\anaconda3\envs\pytorch_env\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

~\anaconda3\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    145         retain_graph = create_graph
    146 
--> 147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 2992; expected version 2991 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Here is error when torch.autograd.set_detect_anomaly(True) is enabled"

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-c1fae9bda3ae> in <module>
      9     ## Backprop on negative log likelihood loss
     10     NLLpig = pig_nll(x=x, mu=l_mu, sigma=l_sigma)
---> 11     NLLpig.backward()
     12     ## Logging to console
     13     if t % 1000 == 0:

~\anaconda3\envs\pytorch_env\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

~\anaconda3\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    145         retain_graph = create_graph
    146 
--> 147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 2992; expected version 2991 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Do you have any suggestions on why the PIG negative log likelihood loss is causing issues?? DEL/Sichel log likelihoods are throwing similar issues, so think it relates to way I have ported NLL from R/C to PyTorch in general. Thanks in advance for your help!! This thread is an amazing resource!!

I would start with removing the usage of all .data attributes (as its usage is deprecated and could yield unwanted side effects). If necessary, wrap the code in a with torch.no_grad() block.

To isolate the offending inplace operation you could replace all inplace ops (e.g. tofY[0] = nmu[i.item()-1] ...) with their out-of-place equivalent. E.g. instead of creating tofY as a zero tensor and assigning values to it, you could append the intermediate values in a list and use e.g. torch.stack on it afterwards to create the tensor.

Hi all,
Can I just add the calculation of loss inside forward() method itself and call backward directly on the output of the model(x) ?
My loss functions requires a lot of internal variables.

Thanks

Yes, that should be possible. However, I would also recommend to check which utilities you are planning to use and how it could interact with this “non-pure” forward method. E.g. I don’t know what impact this approach would have for quantization, scripting, checkpointing etc.

1 Like

You should be careful writing your loss inside of model itself. Specially whey it has internal variables. If you define loss inside your model, then calling model.parameter() for optim, may have some of the loss variables in it.

1 Like

I am facing also an error while i am trying to use dice score multi-class as custom loss fnc
Here it is the loss fnction

def dice(output, target):
    dice_tmp = 0
    for index in range(3):
        dice_tmp += (2 * (output[:,index,:,:] * target[:,index,:,:]).sum()) / ((output[:,index,:,:] + target[:,index,:,:]).sum() + 1e-8)
    dice = torch.mean(dice_tmp) # taking averag
    
    return dice

here it is the training script:

loop = tqdm(loader)

    for data, targets in  loop:
        data = data.to(device=DEVICE)
        targets = targets.float().to(device=DEVICE)
        
        #dice = 0
            
        # forward
        with torch.cuda.amp.autocast():
            predictions = (torch.sigmoid(model(data)) > 0.5).float()
            loss = dice(predictions, targets)
            print(type(loss))

        # backward
        optimizer.zero_grad()
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        # update tqdm loop
        loop.set_postfix(loss=loss.item())

And here it is the error that i am facing

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Hi, I would like to use a custom “case-defined” loss function. Something like that:

class myLoss(nn.Module):
	def __init__(self, device="cuda"):
		super(myLoss, self).__init__()

		self.device = device

	def forward(self, input):

		if input >= 0:
			return torch.log(2*input+1)
		elif input < 0:
			return torch.log(-2*input+1)

Is it ok? @ptrblck

Furthermore, now I am assuming that input is scalar. But how can I adapt the loss function for batched tensors? I was think to a for cycle, but maybe thre are more efficient solutions.

Thanks a lot!

Using conditions is fine unless you are tracing your model via torch.jit.trace as it’s not able to capture data-dependent control flow.
To use a condition on a batch of samples you could use torch.where:

out = torch.where(input>=0, torch.log(2*input+1), torch.log(-2*input+1))
1 Like

Hi @ptrblck,
Thanks for all your answers on this thread. I learned a lot.

Still, I have 1 question. I have a stateful loss function as below:

class PositionLossNormalized(torch.nn.Module):
    def __init__(self, beta, epsilon=1e-8):
        super(PositionLossNormalized, self).__init__()
        self.beta = beta
        self.beta_t = beta
        self.running_avg = 0.0
        self.epsilon = epsilon
    
    def forward(self, targets, predictions):
        sqr_diff = torch.square(targets - predictions)
        Pxj = torch.mean(sqr_diff, dim=-1)
        Pxj2 = torch.square(Pxj)
        
        avg_Pxj = torch.mean(Pxj)
        avg_Pxj2 = torch.mean(Pxj2)
        
        self.running_avg = self.beta * self.running_avg + (1.0 - self.beta) * avg_Pxj2.item()
        loss_val = avg_Pxj / (np.sqrt(self.running_avg / (1.0 - self.beta_t)) + self.epsilon)
        
        self.beta_t = self.beta * self.beta_t
        
        return loss_val

Would this work considering I have a numpy operation (viz. np.sqrt). My thinking is that the I am just scaling the value of avg_Pxj (which is autodiff-able) with a python scalar, and this should not break the autodiff chain. I ran the code and pytorch proceeds without any error. Am I right in doing this ? Or, should I change the code to use only pytorch functions as below :

class PositionLossNormalized(torch.nn.Module):
    def __init__(self, beta, epsilon=1e-8):
        super(PositionLossNormalized, self).__init__()
        self.beta = torch.as_tensor(beta)
        self.beta_t = torch.as_tensor(beta)
        self.running_avg = torch.as_tensor(0.0)
        self.epsilon = torch.as_tensor(epsilon)
    
    def forward(self, targets, predictions):
        sqr_diff = torch.square(targets - predictions)
        Pxj = torch.mean(sqr_diff, dim=-1)
        Pxj2 = torch.square(Pxj)
        
        avg_Pxj = torch.mean(Pxj)
        avg_Pxj2 = torch.mean(Pxj2)
        
        self.running_avg = self.beta * self.running_avg + (1.0 - self.beta) * avg_Pxj2.clone().detach()
        loss_val = avg_Pxj / (torch.sqrt(self.running_avg / (1.0 - self.beta_t)) + self.epsilon)
        
        self.beta_t = self.beta * self.beta_t
        
        return loss_val

Running without errors does not necessarily mean the graph will not be broken by autograd. However, note that standard Python operations can be used and apply elementwise:

loss = x/(y/(1-t)+z)**0.5

I’m not sure for NumPy how they define the operation underneath. But probably not a good idea as it’s likely it gives out a NumPy array, which breaks the graph.

And if using division, you may want to get in the habit of assigning inf/nan values. For example:

loss[torch.isinf(loss)|torch.isnan(loss)]=1000
1 Like

I agree with @J_Johnson and think it would be cleaner to either use PyTorch methods or pure Python operations. Using np.sqrt could be fine in this case but it also doesn’t seem to be necessary as torch.sqrt or the Python equivalent can be used.

1 Like

Just had an opportunity to test this:

import torch
import numpy as np

x=torch.rand((2,3), requires_grad=True)

print(np.sqrt(x))
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

If the tensor has gradients, you’ll get the above error.

2 Likes

I want to use torch.argsort.
That breaks the gradients. Can you please tell me how do I resolve this?

The output of my CNN is the log probability of indices of the original array. Hence, I want to use 10 best of the indices to get the best elements of the original array. I hope I was able to convey the idea

My custom loss function includes torch.abs(). Do I need to inherit from nn.Module?