Autoencoder on CIFAR wrong dimensions

mdimitrov97 · October 13, 2019, 9:51pm

Hello everyone,
I am fairly new to deep learning and neural networks and I am currently following the 60 Minute Blitz tutorial
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
I want to recreate this by extending the CNN to use an autoencoder
The following is my autoencoder

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder,self).__init__()
        
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 6, kernel_size=5),
            nn.ReLU(True),
            nn.Conv2d(6,16,kernel_size=5),
            nn.ReLU(True),
            nn.Sigmoid()
        )
        
        self.decoder = nn.Sequential(             
            nn.ConvTranspose2d(16,6,kernel_size=5),
            nn.ReLU(True),
            nn.ConvTranspose2d(6,3,kernel_size=5),
            nn.ReLU(True),
            nn.Sigmoid(),
        )
    
    def forward(self,x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

However, the part which is computing the accuracy throws the following error
The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 2

Any help will be greatly appreciated, thanks in advance

albanD · October 13, 2019, 11:25pm

Hi, could you give the full stack trace of where this error happens please?

mdimitrov97 · October 13, 2019, 11:34pm

RuntimeError                              Traceback (most recent call last)
<ipython-input-214-52ea346a67f2> in <module>
      7         _, predicted_auto = torch.max(outputs_auto.data, 1)
      8         total += labels.size(0)
----> 9         correct += (predicted_auto == labels).sum().item()
     10 
     11 print('Accuracy of the network on the 10000 test images: %d %%' % (

RuntimeError: The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 2

This is the full code that computes the accuracy of the model

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs_auto = autoencoder(images.to(device))
        _, predicted_auto = torch.max(outputs_auto.data, 1)
        total += labels.size(0)
        correct += (predicted_auto == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

albanD · October 13, 2019, 11:38pm

Given the line, I would guess that predicted_auto and labels don’t have the same size: one is of size 32 and the other of size 16.
You can add some printing in your code to make sure it’s the case.

mdimitrov97 · October 14, 2019, 9:46am

Yeah, I printed their sizes and they are indeed different, but I don’t really understand why they are different since in the tutorial example they are the same size

albanD · October 14, 2019, 2:07pm

Which one is the one you were expecting? the one from the network or the one from the labels?

mdimitrov97 · October 14, 2019, 6:19pm

I was expecting that the predictions from the autoencoder will be the same size as the predictions from the CNN in the example but I’m not sure why they’re different since the autoencoder expands on the CNN

albanD · October 14, 2019, 6:28pm

Can you print the exact sizes of predicted_auto and labels please?
Also it’s possible that you’re missing some padding in the decoder?

mdimitrov97 · October 14, 2019, 11:51pm

predicted_auto size - torch.Size([16, 32, 32])
labels size - torch.Size([16])

albanD · October 15, 2019, 3:27pm

So your auto-encoder outputs something which is the same size as the input. Which is expected for an auto-encoder.
But why is your label just of size 16? Do you actually want to do classification?

mdimitrov97 · October 15, 2019, 6:41pm

Yes, I want to do classification
I am not even sure if it even makes sense to do classification with this particular autoencoder that I have
Sorry, I am very very new to this

albanD · October 15, 2019, 7:20pm

Hi,

Autoencoder is used to reconstruct something that is the same size as your input.
If you just want to do classification, you should have a network that only outputs one value per class. You can see the mnist example for such a model if you want.