Hi, I’m trying to write my own loss functions containing both forward() and backward(). After viewing many people’s functions, there are two things that I’m confusing about and I think they’re also important:
1. In what situation should I write my own backward():
Since I saw some people only write
@staticmethod
def forward(....):
......
return ......
and just use loss.backward() without defining their own backward functions for backpropagation. I would like to know in what kind of situation(s) I don’t have to write my own backward function.
2. Number of gradients and position returns to the model:
For example, if today I want to return two gradients of A and B, when I write the forward function like this:
@staticmethod
def forward(ctx, A, X, B, C, D):
......
return loss
Should I write the backward function like the one shown below?
@staticmethod
def backward(ctx, grad_output):
......
return A_grad, None, B_grad, None, None
I would also like to know the meanings of grad_output since I always get a number(1.0) of grad_output. Not sure when is the correct time to use grad_output*weights as returns
I’m a beginner who really like to know how to write the loss functions correctly. That would be amazing if I can get any correlated links or answers from you guys
Thanks in advance!