Maximize KL-divergence

Hello everyone,

I want to maximize the KL-divergence between 2 classes, The way I am doing this getting their feature vector and then optimizing for the following loss:

loss = - F.kl_div(A.log(), B, None, None, 'sum')

There are some additional losses as well like reconstruction loss and so on. Also all of my other losses are positive and relatively small compared to this loss. Very soon my loss reaches to negative Infinity and the network doesn’t actually learn anything. Any leads on how to fix this kind of problem?
Or if there is other way to maximize the difference between 2 distributions?