Entropy Normalization with RL?

How to do entropy normalization with RL? It looks like from eg https://github.com/andrewliao11/pytorch-a3c-mujoco/blob/master/train.py that we should not use .reinforce but instead calculate the loss ourselves, and back prop that?

I guess it can be backproped separately, using retain_grad=True? Something like:

x = F.softmax(x)
a = torch.multinomial(x)
entropy = - x * x.log()
entropy = entropy.sum(1).mean()

# later ...
opt.zero_grad()
(- entropy_lambda * entropy).backward(retain_graph=True)
a.reinforce(reward)
autograd.backward([a], [None])
opt.step()

?

(hmmm, this slows down laerning a lot, like by ~40% or so. I wonder if there is a better way?)

Ah, can backpropagate it in one go. Entropy normalization still adds a lot of time, but not as bad as backpropping twice:

opt.zero_grad()
entr_loss = - entropy_reg * entropy
a.reinforce(reward)
autograd.backward([a, entr_loss], [None, None])
opt.step()