What happens to `torch.clamp` in backpropagation

I am training dynamics model in model-based RL, it turns out that when torch.clamp the output of dynamics model for valid state values, it is very easy to have gradient NaN, it disappears when not using clamping. So the problem is how actually torch.clamp works in backpropagation ?


This is how clamp’s backward is implemented. It doesn’t look like it can produce NaN’s easily, so I’m not really sure how you’re getting those.