Conditional VAE - concactanate


If i have a one hot vector of shape [25,6] and a data input of [25,1,260,132] how do i concatanate into a single tensor to feed in to the encoder of a convolutional VAE?

like wise the lat_dim tensor is [25,100] how to concatanate to feed into the decoder of the convolutional VAE?


It might depend on your use case and the architecture and there is not a straightforward approach to concatenate two tensors with different shapes and number of elements.

You could repeat the values of the smaller tensor to match the number of elements of the bigger one, but you would of course have to check, if this approach is working and valid for your model.

Alternatively you could also try to reduce the number of elements from the bigger one.

What is your use case and how should the result tensor look like?

hi Ptrblck,

sorry for not responding, i was gettting bogged down with debugging the VAE, which now works, yay!!

anyway,the input tensor for training is [24,1,260,132] and the i would like to be a 1 hot vector of [24,6]. the idea being that once trained it can identify the randomly generated samples…

Can I also expand in this topic? Just today I just posted a very similar question here at the AI stack:

And what if I were to concatenate additional information of type continuous? How to proceed? Would the network be able to capture the additional information by concatenation? (For e.g 300 hidden neurons post convolution + 1 for the added info). I tried both this and expanding the 1D additional info, to let´s say, 10. I wasn´t very sucessfull in both. The VAE just ignores the added info.

Any thoughts on that?

I don’t understand. Should the one-hot encoded tensor be the output?
My question is still, how and where should these tensors be concatenated?
Since the number of elements is different, please refer to my previous post.

Hi PtrBlck,

apologies,as i understand the needs of a conditional VAE, I need to concatate both the input in to the encoder [24,1,260,132] and the Z input into the decoder [24,100] with the one hot vector [24,6].

I can adjust the Z input by changing the latent dimension from 100 to 6 thats easy, however the input shape is fixed, this is the the problem… because otherwise the input becomes [10053, 6] (i realise its not an integer and this needs addressing if this is the only way forward…)

What is the current input to the encoder?
Wouldn’t it work, if you increase the number of inputs in a similar way as for the decoder for the new concatenated input?

Hi Ptrblck,

the input into the encoder is a 2d tensor which is fixed in size (260,132).

I don’t understand what you mean by increase the number of inputs. I have a batchsize of 24 so the input would be 24,1,260,132 and the one-hot vector is 24,6 (as there are 6 different types available)…

if i use:
inputs =[data_in, labels], 1)

then data_in and labels have to be the same size, don’t they???

I am feeling really thick at the moment, i’m sure i’m missing a trick, I just cannot see it…

Sorry for not being clear enough.
You could pass both tensors to the forward method and concatenate the activations as seen in this dummy code:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv = nn.Conv2d(1, 1, 3, 1, 1)
        self.lin = nn.Linear(106, 2)
    def forward(self, x1, x2):
        x1 = self.conv(x1)
        x1 = x1.view(x1.size(0), -1)
        x =, x2), dim=1)
        x = self.lin(x)
        return x

model = MyModel()
x1 = torch.randn(24, 1, 10, 10)
x2 = torch.randn(24, 6)

out = model(x1, x2)

so concatanate at the end of the of convolutions before the fully connected layer?

that obvious, why didn’t i think about that?

thats another beer i owe you, i hope you’re keeping track because i’ve lost count :smiley:

1 Like