Trying to understand writing my own loss function

Hi, I’m trying to write my own loss functions containing both forward() and backward(). After viewing many people’s functions, there are two things that I’m confusing about and I think they’re also important:

1. In what situation should I write my own backward():

Since I saw some people only write

    def forward(....):
           return ......

and just use loss.backward() without defining their own backward functions for backpropagation. I would like to know in what kind of situation(s) I don’t have to write my own backward function.

2. Number of gradients and position returns to the model:
For example, if today I want to return two gradients of A and B, when I write the forward function like this:

    def forward(ctx, A, X, B, C, D):
           return loss

Should I write the backward function like the one shown below?

    def backward(ctx, grad_output):
           return A_grad, None, B_grad, None, None

I would also like to know the meanings of grad_output since I always get a number(1.0) of grad_output. Not sure when is the correct time to use grad_output*weights as returns

I’m a beginner who really like to know how to write the loss functions correctly. That would be amazing if I can get any correlated links or answers from you guys :slight_smile:

Thanks in advance!

You should write your own backward if your function is not differentiable or you don’t want to use the autograd function. this will be the backward which will be called when you call loss.backward(). If you want individual grad’s like in your second case, maybe you call loss.backward() and then read the .grad attributes of A and B.
Just make sure .requires_grad attribute of all the tensors for which you want gradients is set to True