Resnet last layer modification

Hello guys, I’m trying to add a dropout layer before the FC layer in the “bottom” of my resnet. So, in order to do that, I remove the original FC layer from the resnet18 with the following code:

    resnetk = models.resnet18(pretrained=True)
    num_ftrs = resnetk.fc.in_features
    resnetk = torch.nn.Sequential(*list(resnetk.children())[:-1])

Then, I add the dropout and the FC layer using the num_ftrs I obtained from the previous (original) FC layer of my resnet18:

    resnetk.add_module("dropout", nn.Dropout(p=0.5))
    resnetk.add_module("fc", nn.Linear(num_ftrs, n_classes))

But I receive the following error : RuntimeError: size mismatch, m1: [8192 x 1], m2: [512 x 2] at /pytorch/aten/src/THC/generic/

I’m also confused where the softmax gets in, after the linear layer, since in Keras we need to specify the activation function as softmax.

1 Like

Currently you are rewrapping your pretrained resnet into a new nn.Sequential module, which will lose the forward definition. As you can see in this line of code in the original resnet implementation, the activation x will be flattened before being passed to the last linear layer. Since this is missing now, you’ll get the size mismatch error.

You could just manipulate the model.fc attribute and add your dropout and linear layer:

model = models.resnet18(pretrained=True)
model.fc = nn.Sequential(
    nn.Linear(num_ftrs, 10)

This approach will keep your forward method.

It depends on the criterion you are using if you should add a final non-linearity in your model.
If you are dealing with a classification use case and would like to use nn.CrossEntropyLoss, you should pass the raw logits to the loss function (i.e. no final non-linearity), since nn.LogSoftmax and nn.NLLLoss will be called internally.
However, if you would like to use nn.NLLLoss, you should add the nn.LogSoftmax manually.


Thanks @ptrblck, your comments and observations helped me to understand my problem. And I appreciate that you pointed the line of code in the original resnet implementation, it helped even more!

1 Like

Isn’t it like adding dropout to the last layer? Why do we want to drop out some outputs?


No, as the dropout layer is added before the output layer not after.

1 Like

I am very new in this field. Would you agree with me if I say: since it’s sequentially built, whatever written first is applied before. And functional.dropout of a layer is applied after because in the forward method, we are calling the layers one after another, so dropout is required after the layer has been activated?

The execution order of each module in an nn.Sequential container is the same order their forward methods will be called.
Here is a small example, which shows that the first case yields a dense output, while the second one zeroes out some output units:

# Case 1
model = nn.Sequential(
    nn.Linear(10, 10)

x = torch.randn(1, 10)
out = model(x)

# Case 2
model = nn.Sequential(
    nn.Linear(10, 10),

x = torch.randn(1, 10)
out = model(x)

Thank you so much for this example.
Could you tell me what we are dropping in Resnet before the last layer because there is no linear layer before the last layer but a convolutional layer?


In this example, you would drop some of the features, which are fed into the linear layer.
I just provided the code snippet for @Paulo_Mann, so he might know better what his use case is. :wink:

1 Like

I always thought that Dropout is meant to drop some neurons in a Linear layer to reduce the outgoing features. I did not know that it is used to drop in-coming features as well.

Thank you very much for this information. Also, if you have a source handy where I can read more about dropouts’ this behavior, could you please provide :slight_smile:


Can I ask, when I tried to add another additional linear layer, but it came out with

RuntimeError: Error(s) in loading statam e_dict for ResNet: size mismatch for fc.2.weight: copying a param with shape torch.Size([20, 100]) from checkpoint, the shape in current model is torch.Size([60, 100]). size mismatch for fc.2.bias: copying a parwith shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([60]). size mismatch for fc.5.weight: copying a param with shape torch.Size([4, 20]) from checkpoint, the shape in current model is torch.Size([20]). size mismatch for fc.5.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([20]).

I’ve already state strict = False, when i load my model. What else could I be doing wrong?

def initModel4():

resnetbase = models.resnet34(pretrained=True).to(device)

for param in resnetbase.parameters(): #Freeze all the layers and train only the last layer

param.requires_grad = False

# add in dropout layer

fc_layers = nn.Sequential(



nn.Linear(100, 20),



nn.Linear(20, 10),

nn.Linear(10, 4),


##weights for background imgs, tanks, floating head tanks, tank clusters

class_weights = torch.FloatTensor([0.1,0.5,0.3,2.0]).to(device)

loss_function = nn.CrossEntropyLoss(torch.FloatTensor([0.2,0.6,0.4,2.0])).to(device)

optimizer = torch.optim.Adam(resnetbase.parameters(), lr=lr)

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)


If you want to change some layers in a pretrained model, make sure to load the state_dict containing the parameters and buffers in the expected shapes before appyling any manipulations, such as assigning new layers to attributes.
strict=False wouldn’t work in this case, as there are no key mismatches, but the shapes are wrong.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier :wink:


Thank you once again! Apologies for the late reply as I was working on other work for awhile.
And opps! Thanks for the tip! Will take note of it next time!:slight_smile: