Hi all, I’m computing a bistochastic matrix, whereby each entry represents some action. When trying to decode my solution for a trajectory in the REINFORCE algorithm, we need log probabilities to find the loss. Hence, if each entry in my bistochastic matrix is a probability, is there a safe way to take logs for this? I’m not really sure how to do it as directly taking `torch.log`

causes instabilities.

Hi,

I think adding a positive epsilon (e.g. 1e-10) to your matrix before applying the log will solve your problem. Check also that your matrix doesn’t have negative entries (normally the columns of your matrix will be the result of some softmax operation so this shouldn’t be a problem I guess)

Hi Legoh and Othmane!

If your probabilities are, in fact, the result of `softmax()`

, you can pretty

much avoid any instabilities by using pytorch’s `torch.log_softmax()`

to combine the two steps together.

Best.

K. Frank

Just a minor slightly off-topic warning:

One thing I stumbled upon using REINFORCE, is that, if you run it often enough, you will by chance pick a highly unlikely action at some point in time. Since it uses the log of this highly unlikely action the gradient explodes end might entirely mess up your agent, especially when using a momentum based optimizer. So you should definitely use a gradient clipping to avoid that (sadly it took my quite a while to figure that out…)