When attempting to run the code using this loss function python complains that backward() is not defined. While it’s clear that I haven’t defined a backward() function, I didn’t think it would be necessary since the loss function I was previously using didn’t define a backward() function either (MultiLabelSoftMarginLoss )

Could anyone either point me to the backward function used by MultiLabelSoftMarginLoss, or recommend a backward function for the Custom_Loss class?

The cust_Loss function you implement must have backward() defined. In fact, your cust_Loss can’t be any python function but must be a class that inherits from torch.autograd.Function and implements the .forward() and .backward() methods.

So in the MultiLabelSoftMarginLoss, the backward function is the one implemented in F.binary_cross_entropy.

From my understanding of how pytorch works, there are two main categories:

Functions that aggregate together more operations (and typically inherit from nn.Module and have learnable parameters)

Functions that define a new “black box” operation (and do not have learnable parameters)

For the first type of functions you only need to define the .forward() method, specifying which operations are being computed… The backward() method is computed implicitly by torch.autograd by “reversing” all the operations done on the Variable. Clearly, to be able to do this, all the transformations applied on your variable need to be a autograd.Function instance and implement the backward method. So you cannot call math.sqrt(input) and expect it to work, but have to use appropriate functions. Note that since torch overwrites operators like +, *, div etc. it will work in a lot of cases, but it’s good to keep in mind what is going on

The second type need to implement both forward and backward: autograd will treat it as a black box and to compute the gradient it will simply call .backward()

Now, back to your example: my understanding is that you are after the first “type” of function. The problem is that autograd cannot compute the gradient of your customLoss, because you have not implemented it. The binary_cross_entropy function does not define a gradient because it only performs operation on input that autograd knows and can differentiate. For example, if you do something like

def add1(input):
return input + 1

then you don’t need to define backward(), because autograd can compute the gradient of add1 automatically, because it understands what ‘+’ means. This understanding comes from having defined the add method in the Variable class so that it understands what operation is performing. But if in your custom function you perform some other operation that it does not know about, then it can’t compute the gradient and you need to provide to provide a “black box” function that inherits from Function and implements a .backward() method

If it uses only functions that autograd can understand (i.e. functions that have implemented .backward()) then yes, it should be working. Try for example with

>>> x = torch.autograd.Variable(torch.Tensor(1), requires_grad=True)
>>> x
Variable containing:
2.8327
[torch.FloatTensor of size 1]
>>> f = lambda x: x*3
>>> y = f(x)
>>> y
Variable containing:
8.4982
[torch.FloatTensor of size 1]
>>> y.backward()
>>> y.grad #displays nothing since it is None
>>> x.grad
Variable containing:
3
[torch.FloatTensor of size 1]

In this case autograd can compute the .backward() method on y because it understands that * means multiplication and know how to deal with it; same thing for + and other common operators. If you use other functions that autograd doesn’t know about then it won’t work

From what you’ve written it sounds like if my custom loss function is built on simple operations then I will not need to define a custom backward() operation.

To make things a little less abstract let’s assume that I have two tensors A, B where A and B are both nxm. Suppose my custom loss function is something like sum(A - B). Could I simply implement this as a standard python function, or would I need to do something more complicated?

Yes you could. The only question is whether autograd will recognize the operation that you are making (which in this case it will). So to make an example:

>>> A = torch.autograd.Variable(torch.Tensor(2,2), requires_grad=True)
>>> B = torch.autograd.Variable(torch.Tensor(2,2), requires_grad=True)
>>> f = lambda x,y: (x-y).sum()
>>> loss = f(A,B)
>>> loss
Variable containing:
8.5899e+09
[torch.FloatTensor of size 1]
>>> loss.backward()
>>> A.grad
Variable containing:
1 1
1 1
[torch.FloatTensor of size 2x2]
>>> B.grad
Variable containing:
-1 -1
-1 -1
[torch.FloatTensor of size 2x2]

So as you can see we did not need to specify a backward() method because autograd was able to infer how to compute the gradient automatically from the operation we did on our variables.

In order to summarize: as soon as you can do all the operations of your loss function on a Variable (without any access to Variable.data) then you don’t need to define any backward method.

This works with all function/modules from torch torch.nn torch.nn.fonctional

But does not work with functions from torch.Tensor

If you are creating a function requiring access to the Variable.data (for ex, if you are using a function exclusively from torch.Tensor), then you need to extend autograd and to compute the derivative by hand for the backward method.

I think it should not be return sum(A - B), and should be (A-B).sum() or torch.sum(A-B), because sum() is not belong to pytorch library, thus not having .backward() method.