Hello, so I think showing the problematic behavior I’ve encountered first is best:

So I’ve been training Policy-gradient (Reinforcement Learning) methods with *TanhDistributions*. Tanh distributions built-in *log_prob()* method is very sensitive to the numeric value close to 1 & -1. (giving NaN for 1.0 & -1.0, which often crashes the entire program) for float32 this means that sometime I need to `torch.clamp`

the input, with an epsilon. I saw this weird thing during debugging that.

According to the Type info documentation *eps* is supposed to be the smallest number where `1.0 + eps != 1.0`

, which then makes me believe that `x * eps`

with` x < 1`

should result in a `False`

for this equation. Yet this happen only first at 0.5, and *somehow* this only happens at `1.0 + 0.5*eps !=1.0`

and somehow, not for `1.0 - 0.5*eps != 1.0`

.

Am I just misunderstanding what *eps* is supposed to be here?