What is the purpose of eps in the REINFORCE example?

PyTorch’s github provides an example implementation of the REINFORCE algorithm. In calculating standardizing the rewards, it adds a term eps:

returns = (returns - returns.mean()) / (returns.std() + eps)

What is the purpose of this term? It can be found in the example here:

A small eps value is usually added to a division to avoid dividing by zero which would create invalid outputs and invalid gradients. Often it’s picked to be e.g. eps = 1e-6.

1 Like

Is there any intuition to the very explicit definition given in the example? Setting it to something very small but nonzero makes sense, but that explicit a call (starting with np.finfo) seems prescriptive?

The call to np.finfo is to link the value of eps to the dtype being used.
If you’re using double precision or single precision, the numerical instability will be different for a given value of the denominator when it gets closer to 0.
By using finfo you’re making sure that your eps is as big as it should be given the data type you’re using.

1 Like