Hello, so I think showing the problematic behavior I’ve encountered first is best:
So I’ve been training Policy-gradient (Reinforcement Learning) methods with TanhDistributions. Tanh distributions built-in log_prob() method is very sensitive to the numeric value close to 1 & -1. (giving NaN for 1.0 & -1.0, which often crashes the entire program) for float32 this means that sometime I need to
torch.clamp the input, with an epsilon. I saw this weird thing during debugging that.
According to the Type info documentation eps is supposed to be the smallest number where
1.0 + eps != 1.0, which then makes me believe that
x * eps with
x < 1 should result in a
False for this equation. Yet this happen only first at 0.5, and somehow this only happens at
1.0 + 0.5*eps !=1.0 and somehow, not for
1.0 - 0.5*eps != 1.0.
Am I just misunderstanding what eps is supposed to be here?