I am training dynamics model in model-based RL, it turns out that when torch.clamp
the output of dynamics model for valid state values, it is very easy to have gradient NaN
, it disappears when not using clamping. So the problem is how actually torch.clamp
works in backpropagation ?
3 Likes
This is how clamp’s backward
is implemented. It doesn’t look like it can produce NaN’s easily, so I’m not really sure how you’re getting those.