Creating a Clipped Loss Function


According to the DeepMind DQL Paper, the error term is clipped between -1 and 1. I am using clamp for that, but using it doesn’t allow me to use a default loss function like MSELoss.

How can I proceed to do this?

Code may be found in this repo.

1 Like

I don’t understand. Why can’t you use error fns with clamp?

I don’t know how to do it correctly. If I try to use default errors, they usually take two arguments (target, prediction), but then I am unable to clamp on the loss for each pair. On the other hand, if I use clamp(target - prediction, min, max), I end up with only one tensor, and then I can’t use default errors.

Does it make sense?

Oh I see. You can try the reduce=False kwarg on loss functions so they give you a tensor. Then you can do clamp and reduction yourself :slight_smile:

1 Like

Hi @SimonW, thanks for your help! I’ve just updated the optimizer:

loss_func = torch.nn.MSELoss(size_average=False, reduce=False)

And also coded the backward pass accordingly:

# Run backward pass
error = loss_func(q_phi, y)
error = torch.clamp(error, min=-1, max=1)**2
error = error.sum() 

And it seems like no errors appear, which implies that the ‘backward’ operation is running correctly!

Will test it out in the Atari Environment and let you know how it goes. The code is in here in case anyone wants to check it out meanwhile.

Thanks again!

1 Like

With code snippet below

import torch
from torch.autograd import Variable
w = Variable(torch.Tensor([1.]), requires_grad=True)
l = w**2
l_clip = l.clamp(max=0.8)
torch.autograd.grad(l, w, retain_graph=True) # prints (tensor([ 2.]),)
torch.autograd.grad(l_clip, w, retain_graph=True) # prints (tensor([ 0.]),)

It seems that clamp operation stops gradient flow if the value is clipped. Thus I think your code above might not work as you intended.

1 Like

@Hailin_Chen Could you propose a solution for this issue?

Hey @diegoalejogm, I was trying to implement the DQN algorithm by myself and had faced the same issue.

If I understand right, it is mentioned to clip the update of the difference, which probably means the gradient. so this becomes another case of Gradient Clipping and not the Loss clipping directly.

The official DQN code in the pytorch website does gradient clipping as well.
You can find the code here - Reinforcement Learning (DQN) Tutorial — PyTorch Tutorials 1.9.0+cu102 documentation

#Optimize the model

for param in policy_net.parameters():