Resnet50 Model with more outputs that feature labels trained without error

sebscholl · December 25, 2018, 4:39pm

I’m new to Pytorch and trying to understand why something didn’t break.

Details:

I’m building a model that detects skin cancer given an image (it seems this is a common starter project…) and used a pre-trained Resnet50 model for transfer learning.
I incorrectly added a final fully connected layer like this:
model.fc1 = nn.Linear(1000, 3)
and thought that everything was alright, since Layer fc1 DID show up at the end of the model when I inspected the structure.
I trained the model for 50 epochs over 2000 images that were classified into 3 distinct classes, and after training the model was testing at 68% accuracy. However, I then realized when inspecting the prediction that there were still 1000 output features.

Question:

Should there have been an error or some indication that a mismatch between the output features and data labels was present? How was it able to successfully train to a reasonably reasonably accurate threshold with my mistake?

Thanks!

ptrblck · December 27, 2018, 1:09pm

There is a small mistake in reassigning the last linear layer.
In resnet50 the last linear layer is called self.fc instead of self.fc1.
Even though you assigned a new layer to self.fc1, this layer wasn’t used and your model was basically a pretrained ResNet with 1000 output classes.
Fix this typo and try to train your model again.
Also, note that the number of input features for the last linear layer should be 2048 for resnet50.

sebscholl · December 27, 2018, 4:23pm

hey @ptrblck - I had corrected my mistake earlier by adding the pre-trained model to a Sequential module with the final FCL.

nn.Sequential(Resnet50, nn.Linear(1000, 3))

This worked great. What I was confused about though was, why/how would a model with 1000 outputs successfully train on a dataset with the labels?, being that before recognizing the mistake, the model had trained for 50 epochs on 2000 images and was correctly classifying at 68% accuracy.

ptrblck · December 27, 2018, 5:49pm

Your model might still have learned the three classes in your dataset, even though the other 997 logits weren’t used.
I would expect your model to fail, but apparently it was working alright

sebscholl · December 29, 2018, 2:43pm

That’s pretty cool how flexible that is. Thanks!