.cat() / .stack() distributions (e.g. Normal distribution)

Hey :slight_smile:

Is there a way to stack / cat torch.distributions?

Example:

import torch
from torch.distributions import Normal

mean1= torch.zeros((5), dtype=torch.float)
std1 = torch.ones((5), dtype=torch.float)
d1 = Normal(mean1, std1)

mean2= torch.ones((5), dtype=torch.float) * 10
std2 = torch.ones((5), dtype=torch.float) * 10
d2 = Normal(mean2, std2)

mean_result = torch.stack([mean1, mean2])
std_result = torch.stack([std1, std2])
d_result = Normal(mean_result, std_result)

Is there a way to get d_result from d1 and d2 without using/stacking the parameters (in this case mean/std)? I am looking for a way that is agnostic to the type of distributions!

I am still wondering :slight_smile:

There seems to be still no solution.

Normal distribution only works for 1-dimensional distributions, what you need is MultivariateNormal which is located here Probability distributions - torch.distributions — PyTorch 1.10.0 documentation

Sorry, maybe I should have made this more clearer, but the question is not about Normal distributions, but about torch.distributions.Distribution in general.

So you are looking for a “product distribution” (i.e. variables independently sampled from their respective distributions).
I don’t think this is currently implemented, but it should be possible to implement such a distribution without too much effort. Of course, the next person will want the distributions of the components linked by given copulas instead of being independent, but hey.

Best regards

Thomas

Thank you for answering. To be honest, I don’t understand your answer, but I think, maybe I have not been clear enough:

import torch
from torch.distributions import Normal

mean1= torch.zeros((5), dtype=torch.float)
std1 = torch.ones((5), dtype=torch.float)
d1 = Normal(mean1, std1)

mean2= torch.ones((5), dtype=torch.float) * 10
std2 = torch.ones((5), dtype=torch.float) * 10
d2 = Normal(mean2, std2)

# Instead of this
mean_result = torch.stack([mean1, mean2])
std_result = torch.stack([std1, std2])
d_result = Normal(mean_result, std_result)

# I want something like this
d_result = torch.stack_along_a_batch_dimension([d1, d2])

# CURRENT use case
event1 = torch.tensor(...)
event2 = torch.tensor(...)

# One CUDA kernel call per distribution-event-pair
log_prob_sum = d1.log_prob(event1) + d2.log_prob(event2)

# WISH: How I wished I could do it
event = torch.stack([event1, event2])
# Only a single CUDA kernel call
log_prob_sum = d_result.log_prob(event)


with d1 and d2 being arbirtrary torch.distribution.Distribution.

But so what you describe sounds as if it were equivalent to using d1 and d2 separately, right?
So what you are doing mathematically is to define a distribution on the product space of the domains of the two distributions. (So e.g. sample would sample from d1 and d2 and return the concatenated vector. log_prob would split the vector, take the log_prob of d1 and d2 and return the sum.)

Best regards

Thomas

That’s funny, as there are like 5-10 cuda calls, because all log_prob implementations are in Python (so one math op is done at a time). Well, unless JIT is used, but it is cumbersome to use on Distribution objects.

More ontopic, I think distributions.Independent exists for this usecase.

Hi,
thank you for asking. What I want to accomplish has nothing to do with distributions as such. Similar as I can for example input a batch of images in a CNN via looping over the images or via just concatenating the images in the batch dimension and inputing the whole batch, I want to concatenate my distributions and calculate the log prob. So coming back to the example I gave

This is wrong. I want two log_probs, one for each event, since I am only talking about concatenation in the batch dimension!

Thank you for answering. I think you are talking about something different.

I have modules that return torch.distribution.Distribution. I do not wish to know which distribution, I just know it is always the same (e.g. always Normal, or always Categorical). I have a video stream and each of the N images image in the video stream is input into this module in a for loop, since the architecture is RNN-like. Then after I want calculate the log_prob for each distribution I got, so N times. Usually when I get tensors back instead of distributions, I would just stack the tensor and then do some processing. However, with distributions I do not see a way to stack them in the batch_dimension, so I have to loop over them again.

Sounds like you just need to stack parameters and construct a merged distribution. But there is no interface to extract distribution parameters, except in an ad-hoc way:

{name: getattr(distr, name) for name in distr.arg_constraints.keys()}

stack these by key and construct a merged distr.object in a generic way, e.g. distr[0].__class__(**kwargs)