Creating a custom loss function


I’m having some trouble creating a custom loss function and would greatly appreciate guidance.

My loss function looks as follows:

class Custom_Loss(_WeightedLoss):

def forward(self, input, target):
    return cust_Loss(input,target)

When attempting to run the code using this loss function python complains that backward() is not defined. While it’s clear that I haven’t defined a backward() function, I didn’t think it would be necessary since the loss function I was previously using didn’t define a backward() function either (MultiLabelSoftMarginLoss )

Could anyone either point me to the backward function used by MultiLabelSoftMarginLoss, or recommend a backward function for the Custom_Loss class?


The cust_Loss function you implement must have backward() defined. In fact, your cust_Loss can’t be any python function but must be a class that inherits from torch.autograd.Function and implements the .forward() and .backward() methods.

So in the MultiLabelSoftMarginLoss, the backward function is the one implemented in F.binary_cross_entropy.

That seems quite limiting? Ideally, it’d be nice to be able to just string together a formula, I reckon, like:

loss = sqrt((target - label)^2)

… or whatever.

1 Like


Thanks for your reply!

I’m still a little confused though as I don’t see a backward method defined in binary_cross_entropy.

Based on your comments it looks like I should reformulate my loss function as follows:

class Custom_Loss(torch.autograd.Function):

def forward(self, input, target):
    #Do some operations here
    return result

def backward():
    return 'Still not sure what to return here'


From my understanding of how pytorch works, there are two main categories:

  • Functions that aggregate together more operations (and typically inherit from nn.Module and have learnable parameters)
  • Functions that define a new “black box” operation (and do not have learnable parameters)

For the first type of functions you only need to define the .forward() method, specifying which operations are being computed… The backward() method is computed implicitly by torch.autograd by “reversing” all the operations done on the Variable. Clearly, to be able to do this, all the transformations applied on your variable need to be a autograd.Function instance and implement the backward method. So you cannot call math.sqrt(input) and expect it to work, but have to use appropriate functions. Note that since torch overwrites operators like +, *, div etc. it will work in a lot of cases, but it’s good to keep in mind what is going on

The second type need to implement both forward and backward: autograd will treat it as a black box and to compute the gradient it will simply call .backward()

Now, back to your example: my understanding is that you are after the first “type” of function. The problem is that autograd cannot compute the gradient of your customLoss, because you have not implemented it. The binary_cross_entropy function does not define a gradient because it only performs operation on input that autograd knows and can differentiate. For example, if you do something like

def add1(input):
return input + 1

then you don’t need to define backward(), because autograd can compute the gradient of add1 automatically, because it understands what ‘+’ means. This understanding comes from having defined the add method in the Variable class so that it understands what operation is performing. But if in your custom function you perform some other operation that it does not know about, then it can’t compute the gradient and you need to provide to provide a “black box” function that inherits from Function and implements a .backward() method


Hi antspy, just to check, do you mean, as long as we keep our state outside fo the function, and pass it in as a parameter, no need for backward?

so, eg I can do something like:

a = autograd.Variable(...)
def f(input, a):
   # stuff here
   return result

out = f(input, a)

… and all should be ok?

It depends on what #stuff here is!

If it uses only functions that autograd can understand (i.e. functions that have implemented .backward()) then yes, it should be working. Try for example with

>>> x = torch.autograd.Variable(torch.Tensor(1), requires_grad=True)
>>> x
Variable containing:
[torch.FloatTensor of size 1]
>>> f = lambda x: x*3
>>> y = f(x)
>>> y
Variable containing:
[torch.FloatTensor of size 1]
>>> y.backward()
>>> y.grad #displays nothing since it is None
>>> x.grad
Variable containing:
[torch.FloatTensor of size 1]

In this case autograd can compute the .backward() method on y because it understands that * means multiplication and know how to deal with it; same thing for + and other common operators. If you use other functions that autograd doesn’t know about then it won’t work



Thanks again for your extremely helpful insight!

From what you’ve written it sounds like if my custom loss function is built on simple operations then I will not need to define a custom backward() operation.

To make things a little less abstract let’s assume that I have two tensors A, B where A and B are both nxm. Suppose my custom loss function is something like sum(A - B). Could I simply implement this as a standard python function, or would I need to do something more complicated?

For example, could i just do something like this?

def custom_Loss(A,B):
return sum(A - B)

loss = custom_Loss(a,b)

1 Like

Yes you could. The only question is whether autograd will recognize the operation that you are making (which in this case it will). So to make an example:

>>> A = torch.autograd.Variable(torch.Tensor(2,2), requires_grad=True)
>>> B = torch.autograd.Variable(torch.Tensor(2,2), requires_grad=True)
>>> f = lambda x,y: (x-y).sum()
>>> loss = f(A,B)
>>> loss
Variable containing:
[torch.FloatTensor of size 1]

>>> loss.backward()
>>> A.grad
Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

>>> B.grad
Variable containing:
-1 -1
-1 -1
[torch.FloatTensor of size 2x2]

So as you can see we did not need to specify a backward() method because autograd was able to infer how to compute the gradient automatically from the operation we did on our variables.



Thanks so much for your help!! Everything worked as expected!

1 Like

In order to summarize: as soon as you can do all the operations of your loss function on a Variable (without any access to then you don’t need to define any backward method.

This works with all function/modules from

But does not work with functions from

If you are creating a function requiring access to the (for ex, if you are using a function exclusively from torch.Tensor), then you need to extend autograd and to compute the derivative by hand for the backward method.


I think it should not be return sum(A - B), and should be (A-B).sum() or torch.sum(A-B), because sum() is not belong to pytorch library, thus not having .backward() method.