Trying to get the loss of a CNN, but getting three (contradicting?) Runtime Errors

Hello!

I’ve managed to get a small CNN model running, and am now trying to extract its outputs (the loss, mainly). My code for running the data through the CNN looks as follows:

# Get the model information.
model, opt = get_model()
# Define the loss function.
loss_func = F.cross_entropy

# Run the data through the neural network:
for epoch in range(epochs):
    for train_x, train_y in train_dl:
        # Create a prediction.
        pred = model(train_x.permute(0, 3, 1, 2))
        # Reshaping the prediction so cross entropy can be applied.
        pred = pred.permute(2, 3, 0, 1)[0][0]
        # Calculate the loss.
        loss = loss_func(pred, train_y)

        # Backwards propagation.
        loss.backward()
        opt.step()
        opt.zero_grad()

As you can already see I had to do a lot of permuting to make the data fit. It worked in this case, but for the loss:

print(loss_func(model(train_x), train_y))

I get the error: RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[10, 256, 256, 3] to have 3 channels, but got 256 channels instead. When I change it to the following:

print(loss_func(model(train_x.permute(0, 3, 1, 2)), train_y))

I get: RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 1. And then when I try:

print(loss_func(model(train_x.permute(0, 3, 1, 2)[0][0]), train_y))

It says: RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 3, 3, 3], but got 2-dimensional input of size [256, 256] instead. Which indicates that it wants me to remove the [0]s, but when I do, it complains about the above issue, which would need them added again.

Can anyone help me figure out what it wants me to do here?
(I hope I’ve provided all relevant code, but if anything else is needed, let me know and I’ll add it!)

Could you also post the code of the model? Also, what is the task that you are using this CNN for? If it’s image classification, the output of the CNN should not have 4 dimensions.

The model code is as follows:

def get_model():
    model = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 21, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(1)
    )
    opt = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
    return model, opt

It’s just a placeholder model for now, so it probably won’t yield very good results. The task is indeed image classification, for trying to identify 21 different types of land use.

Okay, that sounds good! You are, however, missing a part of your model.

CNNs usually consist of the feature extraction part (convolutions, poolings, etc.), and the classification part (few fully connected layers to learn how extracted features contribute to the classification, in your case, and in the end a softmax layer, which gives you the probability distribution of your classes).

In Pytorch, it is recommended that you extend the nn.Module class when writing a model.

class ConvNet(nn.Module):

def __init__(self, n_classes):
   super.__init__()
   self.featurizer = nn.Sequential(...) # what you  have in your model variable 
   self.classifier = nn.Sequential(...) # 2-3 linear layers, first should in_features = have n_kernels * h_kernel * w_kernel from last conv layer, last should have out_feautres=n_classes )

def forward(self, x)
   # x (b, c, h, w)

   x = self.featurizer(x) # x (b, n_kernels, h_kernel, w_kernel)

   x = torch.flatten(x, start_dim=1) # x (b, n_kernels * h_kernel, w_kernel)

   x = self.classifier(x) # x (b, n_classes)
   x = torch.nn.f.softmax(x) 

On this, you should be able to apply cross-entropy loss. The optimizer, you could also instantiate separately. Let me know if you get stuck :smiley:

Thank you very much for your quick reply! I’ve tried adding this code, filling in the blanks as follows:

class ConvNet(nn.Module):

  def __init__(self, n_classes):
    super.__init__()
    self.featurizer = nn.Sequential(
              nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
              nn.ReLU(),
              nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
              nn.ReLU(),
              nn.Conv2d(64, 21, kernel_size=3, stride=2, padding=1),
              nn.ReLU(),
              nn.MaxPool2d(1)) # what you  have in your model variable 
    self.classifier = nn.Sequential(
              nn.Linear(21*3*3, 120),
              nn.Sigmoid(),
              nn.Linear(120, 84),
              nn.Sigmoid(),
              nn.Linear(84, 21)
    ) # 2-3 linear layers, first should have in_features = n_kernels * h_kernel * w_kernel from last conv layer, last should have out_feautres=n_classes )

  def forward(self, x):
    # x (b, c, h, w)
    x = self.featurizer(x) # x (b, n_kernels, h_kernel, w_kernel)
    x = torch.flatten(x, start_dim=1) # x (b, n_kernels * h_kernel, w_kernel)
    x = self.classifier(x) # x (b, n_classes)
    x = torch.nn.f.softmax(x) 

I’m afraid to speak too soon but the permute seems to no longer be necessary now! I do, however, get the following error still:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x21504 and 189x120)

The 189 x 120 are clearly the parameters I set in the first linear layer, and I assume this error refers to a mismatch between these parameters and the data it actually receives. Which does however make me quite confused as to where the 10 x 21504 comes from.
Is something in my convolutional layers shaping the data in these dimensions?

Edit: Or the “flatten” layer, I just realized. But there doesn’t really seem to be a way to control its output dimensions other than start_dim and end_dim.

Is your batch size 10?

21504 is 21 (the number of output kernels/channels from your last conv dimension) * 32 (h_kernel) * 32 (w_kernel). I think that should be the size of your linear layers. You should calculate the width and height of kernel using formulae you can find online. Also, that 21 from the last conv does not need to be the number of classes you have, you can put 128 for example.

Let me know how it works :slight_smile:

Yes, my batch size was indeed 10!

I unfortunately couldn’t find any online formulas (to be quite honest, I’m not sure what to look for; when I search “cnn kernel size calculator” or similar, I instead find threads that calculate output image size or more in-depth discussions that go beyond my understanding).

I have also tried making an Excel sheet to figure out working combinations of numbers by myself, but the numbers don’t match with the actual results from the model, so I think I’m still missing something.

Could you offer me a hint as to which formula I need?

Okay. Let’s see, you have an image of size (h, w, c), which in your case should be a square RGB image, so h=w – in the examples below I will assume H_in = 32 --, c=3. Your input volume is of shape (n, n, 3). First, you apply 32 filters of size (3, 3), with stride 2 and padding 1.

The dimension of the new output is given by the formula, from Pytorch Conv2d docs:

- In your case, Hout = Wout and padding/kernel size are the same for H and W. Dilation is 1 by default.

So, n_out is floor((32+21-12-1)/2 + 1) = 16. Your output volume after first conv layer is (16, 16, 32) .

The second layer applies 64 filters of shape (3, 3), with stride 2 and padding 1.

So, n_out is floor((16+21-12-1)/2 + 1) = 8. Your output volume after second conv layer is (8, 8, 64).

The last layer applies 21 (or actually 128, after my last edit) filters of shape (3, 3), with stride 2 and padding 1.

So, n_out is floor((8+21-12-1)/2 + 1) = 4. Your output after last conv is (4, 4, 128)

Now with MaxPool2d, if kernel size = 1, that won’t do anything at all. It takes the maximum in a (1, 1) square and puts it in the result. You should go for a kernel size of 2 perhaps, but don’t forget to set the stride to 1. From PyTorch [nn.MaxPool2d docs:] (MaxPool2d — PyTorch 2.1 documentation).

Based on your Pooling kernel size, you have 4x4x128 or 3x3x128 total activations/values/neurons, which you flatten in a single 2048 or 1152-dimensional vector. That number needs to be the same as the first input to your linear layer.

I hope that this is the info you need.

1 Like

Thank you so very much for taking the time to explain this! I see indeed that my MaxPool didn’t do anything, so I upped it to 2 as you suggested. It took me a little while, but I was also able to use the dimension formulas to ensure the linear input size was correct. And lo and behold, it finally ran! :smiley:

The initial results were really poor, so I adjusted the model a litle bit, and this is what it looks like after some tweaking:

def __init__(self, n_classes):
    super.__init__()
    self.featurizer = nn.Sequential(
            # Convolutional part.
            nn.Conv2d(3, 25, kernel_size=4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(25, 50, kernel_size=4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 100, kernel_size=4, stride=2, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),

    self.classifier = nn.Sequential(
            # Linear part.
            nn.Linear(100*16*16, 224),
            nn.Sigmoid(),
            nn.Linear(224, 112),
            nn.Sigmoid(),
            nn.Linear(112, 21)
    )

And of course, I made sure that the linear input was updated accordingly.

With this model, I was able to get a training loss of 2.681906. Still not ideal, probably, but I’m really happy with these results for my very first ever CNN. :slight_smile: I can continue to tweak the parameters now however I need.

I genuinely wouldn’t have been able to get this off the ground without your help. Thanks a lot again!

1 Like

My pleasure! Make sure to also check out

torchvision.models

for some state-of-the-art CNN models, if you are looking for some solid results.

Cheers!