Unable to calculate differential due to zero value in A2C actor-network


Hi, I’m trying to make A2C(Advantage Actor-Critic) network for my project
The problem is occured by actor-network

when actor produce definitive probablity(like [0.00, 1.00]),
by actor’s loss function,
it cause log(0) = inf-value and then i can’t update actor-network.

i can’t any solution about this problem.

Do i have to modify that prob arbitrarily??
(I’m using separated two-network structure model, not one model structure with saperated output nodes)

P.S) There are two choices of actions.(ACTION_DIM)

I solve it.
The problem is in computational method.
I used softmax at my actor-network’s final activation function.
After that, The output(softmax_probability) is used to calculate log(torch.log10)
But I changed softmax to log_softmax function and then success.