Removing classification layer for resnet101-deeplabv3

Hello

I’m trying to remove the classification layer for the torchvision model resnet101-deeplabv3 for semantic seg but I’m having trouble getting this to work. I’ve tried using

backbone = nn.Sequential(*list(self.resnet101deeplab.children())[:-1])

In various ways with no luck. Part of the issue is it returns an OrderedDict and I’m unsure the proper way to take a layer away or extract the required ones. Can anyone help?

1 Like

Would you like to remove the very last layer in this model or the complete classification head (DeepLabHead)?
In the former case, you could just set model.classifier[4] to an nn.Identity layer:

model = models.segmentation.deeplabv3_resnet101()
model.classifier[4] = nn.Identity()

If you would like to change this layer for a custom one using a new number of classes, just replace it with

nn.Conv2d(
    in_channels=256,
    out_channels=nb_classes,
    kernel_size=1,
    stride=1
)
3 Likes

Thanks @ptrblck, that helped :slight_smile: I’m a bit confused on how to use the aux classifier for training this model. Is it essential to use? Do you know anywhere there’s a quick explanation?

I’m trying to decide which layers to train and at what LR for segmenting a facial dataset of 2000 samples, using 11 classes. Any advice/tips on how to approach would be highly appreciated :slight_smile:

If you are using the pretrained model, you’ll get the aux output by default.
I would try to ignore it at first for your fine tuning use case.
I assume they have been used in a similar way to the aux outputs in the inception architectures, so they might be useful if you plan to train all layers.

Given the size of your dataset, I would try to finetune the model using a custom output conv layer and use this as a baseline for further experiments. :wink:

Thanks @ptrblck I tried fine tuning the classification layer but it doesn’t provide good results so far…I’ll experiment with fine tuning the entire head :slight_smile: I notice there is not a pre trained model for the resnet50 case in torchvision. Am I right in saying I can just take the first half of resnet101?

Fine tuning the complete head sounds like a reasonable idea. :slight_smile:

You can use deeplabv3_resnet50 in the current master. :wink:

Thanks @ptrblck I’m also trying to train the entire resnet101/DeepLabv3 model with the aux classifier using something like

loss = loss1 + 0.4 * loss2 # 0.4 is weight for auxillary classifier

But it’s training very very slow. Should I change the 0.4 weight to a much lower value? I’m trying with a LR of 0.001 and 0.01. My input images are normalised and size 256 square. :slight_smile:

Are you using different learning rates for different parts of the model or did you run some experiments with the mentioned learning rates?
I’m not sure, which hyperparameters will work best for your use case.
I assume you are fine tuning a pretrained model currently or are you training from scratch?

@ptrblck Thanks for reply. So far I’ve ran the entire model on these learning rates but have not tried different learning rates for parts apart from freezing the backbone and head. I’ve tried training from scratch and also fine tuning the resnet101/deeplabv3 model. Basically I’m just doing semantic segmentation on a facial dataset of 2000 training images and 10 facial classes but I’m new to this and it’s difficult to say what might work. Is there any standard technique/tricks which might help inform hyper parameter tuning or setting learning rates per layer?

Thanks for the information.
I would assume training from scratch might fail due to the small size of the dataset.
Per-layer learning rates are probably not necessary to create a solid baseline model.
If playing around with some hyperparameters doesn’t yield any performance increase, you could try to overfit a tiny subset (e.g. 10 images). If that still doesn’t work, your code or training procedure might have a bug which we’ve missed so far.

Hi @ptrblck. Thanks for your reply. I’m getting a result approaching 70% mIoU with the resnet50-deeplab (from scratch) but it’s not an improvement over a simple unet model (also from scratch), which I’m searching for. Do you mean if I train on a tiny subset initially it can help boost performance? I’ll try to alter the auxiliary loss weight down and see if that helps.

Something I’ve noticed is that my validation loss (cross entropy) will decrease with training and then predictably increase after X epochs; but my mIoU on the validation set continues to rise, which seems contradictory. Do you know why this might be?

Hello. I am now currently working with resnet-101 and I need to change the last layer of the model. However, I am not sure whether “classifier.4” is the last layer or “aux_classifier” is? I would appreciate any helps :slight_smile:

A “plain” resnet101 uses the fc attribute as its last layer:

import torchvision.models as models

model = models.resnet101()
print(model)

The source code also shows the internal layer usage.