Hi, I’m trying to make A2C(Advantage Actor-Critic) network for my project
The problem is occured by actor-network
when actor produce definitive probablity(like [0.00, 1.00]),
by actor’s loss function,
it cause log(0) = inf-value and then i can’t update actor-network.
i can’t any solution about this problem.
Do i have to modify that prob arbitrarily??
(I’m using separated two-network structure model, not one model structure with saperated output nodes)
P.S) There are two choices of actions.(ACTION_DIM)