Error in the backward of custom loss function

bill · April 15, 2020, 11:29am

Hi, I’m new in the pytorch.
I have a question about the custom loss function.
The code is following.
I use numpy to clone the MSE_loss as MSE_SCORE.
Input is 1x200x200 images, and batch size is 128.
The output “mse” of the MSE_SCORE is a float value and covered as tensor.

However, when I tried to apply the backward of custom loss function, then got the error “RuntimeError: Function MSE_SCOREBackward returned an invalid gradient at index 0 - got [] but expected shape compatible with [128, 1, 200, 200]”

 class MSE_SCORE(Function):
     @staticmethod
     def forward(ctx, input, label):
         with torch.no_grad():
             numpy_input = input.detach().numpy()
             numpy_label = label.detach().numpy()
             offset_height = int(height / 4)
             offset_width = int(width / 4)
             target_height = int(height / 2)
             target_width = int(width / 2)
             crop_input = numpy_input[:, :, offset_height:offset_height + target_height, offset_width:offset_width + target_width]
             crop_label = numpy_label[:, :, offset_height:offset_height + target_height, offset_width:offset_width + target_width]
             diff = (crop_input - crop_label)**2
             mean = np.mean(diff, axis=(1,2,3))
             mse = np.sum(mean)
 
         return torch.tensor(mse)
 
     @staticmethod
     def backward(ctx, grad_output):
         return grad_output, None
 
 def criterion_mse(input, label):
     return MSE_SCORE.apply(input, label)
 
 for epoch in range(num_epochs):
     for step, (img, label) in enumerate(dataloader):
         img = Variable(img).to(device=device)
         label = Variable(label).to(device=device)
         output = model(img)
         mse_loss = criterion_mse(output.cpu(), label.cpu()).to(device=device)
         optimizer.zero_grad()
         loss_seq = [mse_loss]
         grad_seq = [torch.tensor(1.0).to(device=device) for _ in range(len(loss_seq))]
         torch.autograd.backward(loss_seq, grad_seq)

albanD · April 15, 2020, 4:18pm

Hi,

The problem here is that your implementation of the backward is not correct.

You don’t need to use torch.no_grad() is a custom Function
Why not run the exact same code (without the conversion to numpy) in your criterion_mse() function so that it will be autodiffed and you get backward for free?
If you don’t do that, you need to implement the backward yourself. And in particular, the error you see here is because the gradient your return for input don’t have the same size as `input.

PS: I updated your post to make the code more readable using triple backticks.

bill · April 21, 2020, 9:15am

Hi albanD,
Thank a lot for reply.

Because I want to do more complex work for this custom loss function, I have to use the conversion to numpy.
I’m not sure how grad_output works, do you have any example?
The following code modified by myself. Is it correct?

class MSE_SCORE(Function):
    @staticmethod
    def forward(ctx, input, label):
        numpy_input = input.detach().numpy()
        numpy_label = label.detach().numpy()
        offset_height = int(height / 4)
        offset_width = int(width / 4)
        target_height = int(height / 2)
        target_width = int(width / 2)
        crop_input = numpy_input[:, :, offset_height:offset_height + target_height, offset_width:offset_width + target_width]
        crop_label = numpy_label[:, :, offset_height:offset_height + target_height, offset_width:offset_width + target_width]
        diff = (crop_input - crop_label)**2
        mean = np.mean(diff, axis=(1,2,3))
        ctx.size = len(mean)
        mse = np.sum(mean)
        ctx.save_for_backward(input, label) --- modified
        return torch.tensor(mse)

    @staticmethod
    def backward(ctx, grad_output):
        grad_output = grad_output.detach()
        input, label = ctx.saved_tensors --- modified
        return grad_output * input, None --- modified

albanD · April 21, 2020, 2:41pm

The backward() function should compute one step of the chain rule of your function y = f(x)
grad_output is the gradient flowing from the lower layers dl/dy. And you should return dl/dx.
The way to compute it is by doing: dl/dx = dl/dy * dy/dx.

bill · May 20, 2020, 9:50am

Hi @albanD

I don’t know about “grad_output” means. Accounding to “dl/dx = dl/dy * dy/dx”, does it means dy / dx?

With my poor understanding:
l = sum(mean( (label - input)**2 ))
dl / dx = -2 * sum(mean( (label - input) )) * dy / dx
Is it correct?

class MSE_SCORE(Function):
    @staticmethod
    def forward(ctx, input, label):
        numpy_input = input.detach().numpy()
        numpy_label = label.detach().numpy()
        diff = (numpy_label - numpy_input)**2
        mean = np.mean(diff, axis=(1,2,3))
        mse = np.sum(mean)
        ctx.save_for_backward(input, label)

        return torch.tensor(mse)

    @staticmethod
    def backward(ctx, grad_output):
        grad_output = grad_output.detach()
        input, label = ctx.saved_tensors
        numpy_input = input.detach().numpy()
        numpy_label = label.detach().numpy()
        grad = -2 * np.sum(np.mean((numpy_label - numpy_input), axis=(1,2,3))) * np.ones(input.shape)
        return grad_output * grad, None

albanD · May 20, 2020, 2:45pm

Hi,

grad_output is the gradient wrt to the output of the forward, so in this case dl/dy.

And yes the formula looks good.

In the implementation,

for the forward, even though you can go through numpy, I don’t think you should here: all these ops just work on pytorch.
For the backward, you should write it in a differentiable manner in case you try to access grad of grad in there. If you don’t, use from torch.autograd.function import once_differentiable as another decorator for the backward.
Same point for the backward that I don’t see any reason to use numpy, even if you make it once_differentiable.