I want to compute a the KL divergence between 2 batches of distributions. x
is my tensor with predicted distributions and target
contains the target distributions. The shape of both x
and target
is (batch_size, max_dist_size)
. Each row in x
and the target
contains a distribution whose support is n <= max_dist_size
. I also have a list of dist_size
.
I am currently considering doing something like this:
criterion = nn.KLDivLoss(size_average=False)
l = 0.
for i in range(x.size(0)):
l += criterion(x[i, :dist_size[i]].unsqueeze(0), target[i, :dist_size[i]].unsqueeze(0))
Is there a better way to use the dist_size
list to mask and obtain the KL divergence over the entire batch?