Expected stride to be a single integer value or a list of N values to match the convolution dimensions, but got stride=[2,2]

Thanks for the code.
Change the following and your code should run again:

  • Change the linear layer in your encoder to: nn.Linear(256*3*3, z_dim*2)
  • Remove the Upsample layer in your decoder

Sorry to bother, but maybe Iā€™m missing something because I still get a mismatch error:

size mismatch, m1: [1 x 147456], m2: [2304 x 20] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

Ah sorry, my bad! I just added my batch size of 1 to the View() layer.
You should change it to:

encoder:
View((batch_size, -1))

decoder:
View((batch_size, -1, 3, 3))

Unfortunately, you cannot use x.size(0), since the layer is defined in a Sequential, so you have to know your batch size beforehand.