Autoencoder on CIFAR wrong dimensions

Hello everyone,
I am fairly new to deep learning and neural networks and I am currently following the 60 Minute Blitz tutorial
I want to recreate this by extending the CNN to use an autoencoder
The following is my autoencoder

class Autoencoder(nn.Module):
    def __init__(self):
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 6, kernel_size=5),
        self.decoder = nn.Sequential(             
    def forward(self,x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

However, the part which is computing the accuracy throws the following error
The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 2

Any help will be greatly appreciated, thanks in advance

Hi, could you give the full stack trace of where this error happens please?

RuntimeError                              Traceback (most recent call last)
<ipython-input-214-52ea346a67f2> in <module>
      7         _, predicted_auto = torch.max(, 1)
      8         total += labels.size(0)
----> 9         correct += (predicted_auto == labels).sum().item()
     11 print('Accuracy of the network on the 10000 test images: %d %%' % (

RuntimeError: The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 2

This is the full code that computes the accuracy of the model

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs_auto = autoencoder(
        _, predicted_auto = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted_auto == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Given the line, I would guess that predicted_auto and labels don’t have the same size: one is of size 32 and the other of size 16.
You can add some printing in your code to make sure it’s the case.

Yeah, I printed their sizes and they are indeed different, but I don’t really understand why they are different since in the tutorial example they are the same size

Which one is the one you were expecting? the one from the network or the one from the labels?

I was expecting that the predictions from the autoencoder will be the same size as the predictions from the CNN in the example but I’m not sure why they’re different since the autoencoder expands on the CNN

Can you print the exact sizes of predicted_auto and labels please?
Also it’s possible that you’re missing some padding in the decoder?

predicted_auto size - torch.Size([16, 32, 32])
labels size - torch.Size([16])

So your auto-encoder outputs something which is the same size as the input. Which is expected for an auto-encoder.
But why is your label just of size 16? Do you actually want to do classification?

Yes, I want to do classification
I am not even sure if it even makes sense to do classification with this particular autoencoder that I have
Sorry, I am very very new to this


Autoencoder is used to reconstruct something that is the same size as your input.
If you just want to do classification, you should have a network that only outputs one value per class. You can see the mnist example for such a model if you want.

1 Like