KL Divergence calculation

I want to calculate the kl divergence for two probability distributions. but one is a tensor of size (64, 936, 32, 32) and the other is (64, 939, 32, 32). as you can see the difference is small. how can I make them the same size without ruining the data and kl divergence value.
I tried adding 0s but it doesn’t work.

Could you explain what each of the dimensions represent?

Also, if I’m not mistaken, the KL divergence term contains the log of the probability distribution so if you were to add in zeros you’re effectively adding in negative infinity to your data which is going to cause a problem when backpropagating your loss.

well originally (1,64,32,32) was the activation of one layer of my resnet model. 1= batchsize, 64 = neurons in the layer. and I saved it for each 936 pics that I gave to my model. I reshaped and permuted that for another purpose and so now I have a tensor of size (64,939,32,32). it’s the same with the other tensor.
How can I reshape one of them and which one should I reshape?
What if I was to remove the excess data and make them both of size (64, 936, 32, 32)? how should I do it? I think in my case it is wouldn’t hurt the purpose.

So, from my understanding you have 2 different Tensors with different batchsizes? Is that correct?

yes. you could say that

Hmmm, that’s odd that your batch dimension gets changed via the function. Are you sure it’s not meant to change a different dimension of your Tensor?

Would you mind explain what that function does?

Ideally, your loss function should take 2 Tensors with the same batch dimension as you’re effectively measuring the difference between 2 probability distributions with your data.

it’s a bit complicated. I’m experimenting something!
what can be done about the tensor reshaping? how do I delete data? I can permute it so it’ll be (64, 32, 32, 939) and then delete data to make it 936. but how do I delete index of a tensor in pytorch?

With regards to deleting particular indices of an array, you could just redefine the array to be the old over the given indices, i.e.,

data = torch.randn(64,939,32,32)
data.shape #returns torch.Size([64, 939, 32, 32])
data = data[:,0:936,:,:]
data.shape #retuns torch.Size([64, 936, 32, 32])

One thing that came to mind was, could you not apply this custom function (which changes the batchsize) after you calculate the KL-divergence? So, I assume you have 2 Tensors which represent prediction and targets? Could you calculate the KL-divergence between the two, then apply your custom function afterward? and include the extra 3 batch value in some other way?