KL loss, feeding with tensor or sampled distribution?

Hi, I know that his question is silly, but I have a big hole in my understanding.

KL loss helps me to make input dist similar to target dist. , I recently see it more and more in paper, where I have some FeatureMap tensor, and they want this tensor to look like another tensor(target) (doing transfer learning, knowledge distillation,so on), and I wonder does, KL loss is able to get an input tensor or I need to do some “categorical sampling” with Gumbel softmax tricks?

Maybe the question makes no sense, but I have to know this important detail.


What do you mean by this?

Does the kl loss in pytorch, get as input 2 distributions or 2 tensors and the loss it’s self is doing some sampling…
If I am far away in my understanding , please guide me

Two tensor inputs: https://pytorch.org/docs/stable/nn.html#kl-div
Two distribution inputs: https://pytorch.org/docs/stable/distributions.html#module-torch.distributions.kl

great! can I have 1 last question?
Let’s say I want to feed the kl-loss with an image and a target image, so he is doing the kl_div on each pixel? (and than so sum,mean etc…that I understood).
Just to be sure that I understood correctly…