Hi, I’m new in the pytorch.
I have a question about the custom loss function.
The code is following.
I use numpy to clone the MSE_loss as MSE_SCORE.
Input is 1x200x200 images, and batch size is 128.
The output “mse” of the MSE_SCORE is a float value and covered as tensor.

However, when I tried to apply the backward of custom loss function, then got the error “RuntimeError: Function MSE_SCOREBackward returned an invalid gradient at index 0 - got [] but expected shape compatible with [128, 1, 200, 200]”

The problem here is that your implementation of the backward is not correct.

You don’t need to use torch.no_grad() is a custom Function

Why not run the exact same code (without the conversion to numpy) in your criterion_mse() function so that it will be autodiffed and you get backward for free?

If you don’t do that, you need to implement the backward yourself. And in particular, the error you see here is because the gradient your return for input don’t have the same size as `input.

PS: I updated your post to make the code more readable using triple backticks.

Because I want to do more complex work for this custom loss function, I have to use the conversion to numpy.
I’m not sure how grad_output works, do you have any example?
The following code modified by myself. Is it correct?

The backward() function should compute one step of the chain rule of your function y = f(x) grad_output is the gradient flowing from the lower layers dl/dy. And you should return dl/dx.
The way to compute it is by doing: dl/dx = dl/dy * dy/dx.

grad_output is the gradient wrt to the output of the forward, so in this case dl/dy.

And yes the formula looks good.

In the implementation,

for the forward, even though you can go through numpy, I don’t think you should here: all these ops just work on pytorch.

For the backward, you should write it in a differentiable manner in case you try to access grad of grad in there. If you don’t, use from torch.autograd.function import once_differentiable as another decorator for the backward.

Same point for the backward that I don’t see any reason to use numpy, even if you make it once_differentiable.