So I’ve been training Policy-gradient (Reinforcement Learning) methods with TanhDistributions. Tanh distributions built-in log_prob() method is very sensitive to the numeric value close to 1 & -1. (giving NaN for 1.0 & -1.0, which often crashes the entire program) for float32 this means that sometime I need to torch.clamp the input, with an epsilon. I saw this weird thing during debugging that.
According to the Type info documentationeps is supposed to be the smallest number where 1.0 + eps != 1.0, which then makes me believe that x * eps with x < 1 should result in a False for this equation. Yet this happen only first at 0.5, and somehow this only happens at 1.0 + 0.5*eps !=1.0 and somehow, not for 1.0 - 0.5*eps != 1.0.
Am I just misunderstanding what eps is supposed to be here?
eps : float
The difference between 1.0 and the next smallest
representable float larger than 1.0. For example,
for 64-bit binary floats in the IEEE-754 standard,
eps = 2**-52, approximately 2.22e-16.
What’s at issue in your numerical test is that neither 1.0 + 0.5 * eps
nor 1.0 + 0.9 * eps is exactly representable, so the floating-point
arithmetic has to decide whether to round up or down to a representable
number.
In the first case, it rounds down to 1.0; in the second, up to 1.0 + eps.
(By definition, there are no representable numbers in between.)
Here’s an illustration that shows what is going on including the correctness
of numpy’s definition of eps: