Custom loss functions

Losses can be generally in different ranges, but note that the gradient magnitudes would thus also be in different ranges. This could basically mask the “smaller loss” as the gradients would look like noise during the calculation.

1 Like

do you think is i better to use

## --- ALL defined loss ----------------        
## ---- back propagate-----------------        
        # Update G



The result should be the same, but I would use the second approach, as the backward operation would only be called once and should thus be faster.

1 Like

Hi Ptrblck,

I try to use the second different loss function and add it to the original one as I said before, but no updating occur in the weights. I change the second loss functions but no changes. Do you think is there any thing wrong? I am running the code on GPU. The first loss is nn.BCELoss() and teh second is L1. The result is as same as using just BCNLoss, L1 or other losses does not have effect.

netG = Generator994(ngpu,nz,ngf).to(device)

optimizerG = optim.Adam(netG.parameters(), lr=lr2, betas=(beta1, 0.999))


output = netD(fake).view(-1)

# Calculate G's loss based on this output
errG1 = criterion(output, label)

xxx=torch.histc(GaussyMask.squeeze(1).view(-1).cpu(),100, min=0, max=1, out=None)

xxx1=torch.histc(fake.squeeze(1).view(-1).cpu(),100, min=0, max=1, out=None)


# Calculate gradients for G adding two losses

D_G_z2 = output.mean().item()
# Update G

@ptrblck :pray: Just wanted to appreciate your amazing patience and grace in answering even totally non-pytorch questions! :clap:

(have benefitted from reading your answers on several occasions)


Double post with answer from here.

1 Like

Would i able to use numpy operations at first and at last return tensor operations.
Im making custom triplet loss function , here is my code

class HardTripletLoss(nn.Module):
    def __init__(self, alpha=0.25):
        self.alpha = alpha
    def forward(self, q1_vec, q2_vec):
       #..... some numpy operations with tensors
        l_full = torch.mean(l_1 + l_2)
        return l_full

would i able to do .backward .
I could use torch operations with this functions but some operations like np.max or np.maximum or some other which is difficult to do with torch. operations some good torch functions are in unstable yet.

If possible
would you please give me idea to use numpy operations on custom loss

Thanks :smiley:

I have tried to use l_full=torch.mean(l_1+l_2, requires_grad=True) for gradient and while computing q1_vec and q2_vec at first i used .detach().numpy() ,

which i used a toy example, dont know whether it worked or not but gave gradient value for .backward() method,

v1 = torch.tensor([[0.26726124, 0.53452248, 0.80178373],[0.5178918 , 0.57543534, 0.63297887]], requires_grad=True)
v2 = torch.tensor([[ 0.26726124,  0.53452248,  0.80178373],[-0.5178918 , -0.57543534, -0.63297887]], requires_grad=True)
HardTripletLoss()(v1, v2).backward()

tensor(0.5509, grad_fn=<DivBackward0>)

No, Autograd won’t be able to track the numpy operations, so you would need to implement the backward pass manually via a custom autograd.Function as described here.

I don’t understand the second code snippet, as l_full as lwell as the other tensors are not used in the last HardTripletLoss example.

1 Like

Hi @ptrblck. I have a similar problem where I’m creating a custom loss function extending the nn.module. However the results are absurd.

def forward(self, x):
    with torch.set_grad_enabled(True):
        time_step =torch.tensor(0.01)
        out=self._rk4_step1(self.function, x, 0, time_step)
    return out
def function(self,x,t):
         self.n = n = x.shape[1]//2
         qqd = x.requires_grad_(True)
         L = self._lagrangian(qqd).sum()
         J = grad(L, qqd, create_graph=True)[0] ;
         DL_q, DL_qd = J[:,:n], J[:,n:]
         DDL_qd = []
         for i in range(n):
             J_qd_i = DL_qd[:,i][:,None]
             H_i = grad(J_qd_i.sum(), qqd, create_graph=True)[0][:,:,None]
         DDL_qd =, 2)
         DDL_qqd, DDL_qdqd = DDL_qd[:,:n,:], DDL_qd[:,n:,:]
         T = torch.einsum('ijk, ij -> ik', DDL_qqd, qqd[:,n:])
         qdd = torch.einsum('ijk, ij -> ik', DDL_qdqd.pinverse(), DL_q - T)

         return[qqd[:,self.n:], qdd], 1)
def _lagrangian(self, qqd):
    x = F.softplus(self.fc1(qqd))
    x = F.softplus(self.fc2(x))
    # x = F.softplus(self.fc3(x))
    L = self.fc_last(x)
    return L
def _rk4_step1(self, f, x, t, h):
    # one step of Runge-Kutta integration
        k1 = torch.mul(f(x, t),h)
        k2 = torch.mul(f(x + k1/2, t + h/2),h)
        k3 = torch.mul(f(x + k2/2, t + h/2),h)
        k4 = torch.mul(f(x + k3, t + h),h)
        return x + 1/6 * (k1 + 2 * k2 + 2 * k3 + k4)

Is the autograd able to track all gradient even though I’m calling the forward multiple times in the rk4_step?

It might be a dumb question.

How does Autograd know how to compute the gradients of any arbitrary Loss function which is implemented using Tensor operations?

The derivatives.yaml file contains definitions for the backward passes for the implemented operations.
Autograd will track all operations in the forward pass and use these derivatives to compute the gradient of the loss w.r.t. the parameters.

That helps, thanks a bunch!


I have a CNN architecture as follows:


Conv1: (3, 32, 5, 1, 0)
Conv2: (32, 64, 5, 1, 0)
Conv3: (64, 128, 5, 1, 0)
Conv4: (128, 256, 5, 1, 0)
And output layer as convolutional layer itself.
Conv5: (256, 10, *, 1, 0)

All convolutional layers are customized with torch.autograd.function i.e. they have forward and backward defined in it.

I am using two loss functions:

Class my_Loss_func(torch.nn.Module):
           return loss1

In training loop:

Loss1 = my_Loss_func (output of conv4, labels)
Loss2 = torch.nn.CrossEntropyLoss(final output, labels)
loss = Loss1 + Loss2

In doing so, I think backward pass would still execute but wrongly. Because back propagation is happening twice (override?) through conv4 i.e. once during Loss1 and the other time during Loss2 as such they are added. (So, update will take place twice as well ?)

What one wants is, first conv4 should be updated(only once after back propagating once) then conv3 , then …conv1.

Your code snippet is a bit unclear, so I’m not completely sure what your use case is.
However, the backward call will calculate gradients of both losses w.r.t. the parameters used to calculate these losses.
If some parameters were used in both loss calculations, the gradient will be accumulated for these parameters.

Actually, Loss1 is contrastive loss whose inputs are features from conv4 and labels. In addition to that, Loss2 is just the cross entropy loss whose inputs are output of the network after conv5 and labels.

Task: Image classification

So, is it correct if I say :

  1. The parameters of conv4 will get updated twice, once according to Loss1 and the second time according to Loss2 ?

  2. The parameters of layers other than conv4 will also get updated according to Loss1 ?

I am guessing 1. should happen and 2. shouldn’t.
What do your opinion?

  1. No, the parameters will get updated in the optimizer.step() call. The gradients of parameters of reused modules will get accumulated, if the corresponding computation graph uses them.

A small illustration of my last post:
Assuming your model architecture is:

input -> conv1 -> conv2 -> conv3 -> conv4 -> conv5 -> output -> loss2
                                           \-> conv4_output -> loss1

If this is the workflow of the loss calculations, then loss1.backward() will accumulate gradients for the parameters in conv1,2,3,4, while loss2.backward() will accumulate gradients for the parameters in conv1,2,3,4,5.
The same applies for the sum of both losses.

Hello ptrblck,

I an trying to create a custom loss function in CNN for regression. The input is a binary image (600x600) which the background is black, and foreground is white. The ground truth associated with each input is an image with color range from 0 t 255 which is normalized between 0 and 1.

x =Input, ground truth=y and predicted output=y_hat

I tried to penalize the foreground by custom loss function below, but it didn’t improve the result. I am wondering whether my idea is right or not, if yes what’s wrong with my custom function?

mse = nn.MSELoss(reduction=‘mean’)

def criterion(y, y_hat, x, loss, weight=0.1):
y_hat_modified = torch.where(x[:,0:1]==1, weight*y_hat,y_hat) # x[:,0:1] is input
return loss(y,y_hat_modified)

I created a topic for it and you can see more detailed info there.
custom loss function for regression in cnn

Yes, that’s where I’m confused.

So, for all the parameters of conv1,2,3,4, will there be 2 values of gradients stored in .grad? or only one value will be there in .grad because of override?

optimizer.step() updates all the parameters based on parameter.grad. So, I doubt if both the gradients in .grad will be used for update or maybe they are added and then… I don’t know.

There will be one .grad value containing the sum of the gradients calculated during the backward passes.

Alright , thanks for the explanation :hugs: