How to setup fixed num of output channels in torchvision models such as densenet or resnet

Hi everyone. May I ask if t’s possible to setup up a fixed number of output channels for models such as densenet/resnet on the different blocks? For instance I have a model and I would like to add densenet or resnet as a feature extractor model. But the problem that I’m facing at the moment is that for instance in the densenet/resnet models you have different blocks, either bottleneck or dense blocks, depending on the model. My goal it to take the output of each of the dense or bottleneck blocks but at the same time I would like to fix the the number of output channels to my predefined choice, is that possible? I’ve tried using num_init_features when initializing the models but that didn’t work I kept getting and error about unrecognizable keyword.

Just to understand your question correctly, you would like to use a pre-trained model e.g. resnet as a feature extractor in your main model.
In your use case it’s not sufficient to get the last activation from the feature extractor, but you would like to get all activations from all blocks? Since these “blocks” have different output shapes, you would like to fix them somehow. Is this understanding correct?

Could you explain a bit more about the activations, i.e. what would you like to do with them? Feeding them into another model?

@ptrblck Thank you for your reply. Just to clarify, for instance the densenet model has a couple of dense blocks, I believe that there are actually 4 of them. Given an input x I would like to get the output from denseblock1, denseblock2, …, denseblock4. Now as you said these blocks provide outputs of different shapes [B,C,H,W]. I would like to just fix the ‘C’ number of output channels for each of these blocks. Now why or where I would use such a thing. For instance I want to experiment with mask-rcnn and use the densenet as feature extractor but the mask-rcnn model expects 4 inputs a.k.a 4 tensors of shapes [B,C,H,W] and each of them requires a fixed number of channels otherwise I get a mismatch tensor error.

Ok, I see. Assuming you don’t need to backpropagate through the Densenet, you could register forward hooks to each block and save the activation. Since you need a specific number of output channels, you could add an additional Conv2d layer with your defined out_channels after each Denseblock.

I created a small example of this approach:

activations = {}
def get_activation(name):
    def hook(model, input, output):
        activations[name] = output
    return hook
    

model = models.densenet121(pretrained=False)

# Register forward hooks with name
for name, child in model.features.named_children():
    if 'denseblock' in name:
        print(name)
        child.register_forward_hook(get_activation(name))

# Forward pass
x = Variable(torch.randn(1, 3, 224, 224))
output = model(x)

# Create convs to get desired out_channels
out_channels = 1
convs = {'denseblock1': nn.Conv2d(256, out_channels, 1,),
         'denseblock2': nn.Conv2d(512, out_channels, 1),
         'denseblock3': nn.Conv2d(1024, out_channels, 1),
         'denseblock4': nn.Conv2d(1024, out_channels, 1)}

# Apply conv on each activation
for key in activations:
    act = activations[key]
    act = convs[key](act)
    print(key, act.shape)

Let me know if that helps or if I misunderstood something!

@ptrblck Thank you for your help, let me try it out and get back to you. :smile:

@ptrblck Thanks a million. That is a very neat trick. I presume that there exist also the same way to register the backward hook. I just tested it and it seems to be working. :rocket:

It seems that now the model is not learning anything after changing the feature extractor model:

0.0010   0.0 k    0.2  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.482   0.18 9.88   0.72 0.00   0.70 | 9.917   0.16 8.50   0.56 0.00   0.70 |  0 hr 
00.0010   0.0 k    0.2  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.482   0.18 9.88   0.72 0.00   0.70 | 11.979   0.09 10.54   0.65 0.00   0.70 |  0 hr
0.0010   0.0 k    0.2  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.482   0.18 9.88   0.72 0.00   0.70 | 12.157   0.09 10.74   0.62 0.00   0.70 |  0 hr
0.0010   0.0 k    0.2  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.482   0.18 9.88   0.72 0.00   0.70 | 10.715   0.08 9.35   0.58 0.00   0.70 |  0 hr 
0.0010   0.0 k    0.2  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.482   0.18 9.88   0.72 0.00   0.70 | 11.043   0.07 9.71   0.56 0.00   0.70 |  0 hr 
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.482   0.18 9.88   0.72 0.00   0.70 | 12.276   0.07 10.91   0.60 0.00   0.70 |  0 hr
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.482   0.18 9.88   0.72 0.00   0.70 | 11.503   0.10 10.19   0.51 0.00   0.70 |  0 hr
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 12.236   0.06 10.92   0.56 0.00   0.70 |  0 hr
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 11.637   0.06 10.33   0.55 0.00   0.70 |  0 hr
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 10.647   0.13 9.33   0.49 0.00   0.70 |  0 hr
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 10.425   0.06 9.10   0.56 0.00   0.70 |  0 hr
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 10.466   0.05 9.29   0.43 0.00   0.70 |  0 hr
0.0010   0.0 k    0.3  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 11.001   0.05 9.74   0.51 0.00   0.70 |  0 hr
0.0010   0.0 k    0.4  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 10.914   0.05 9.58   0.59 0.00   0.70 |  0 hr
0.0010   0.0 k    0.4  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 10.167   0.12 8.75   0.60 0.00   0.70 |  0 hr
0.0010   0.0 k    0.4  0.0 m | 12.273   0.17 10.70   0.70 0.00   0.70 | 11.600   0.12 10.15   0.63 0.00   0.70 | 11.728   0.05 10.46   0.53 0.00   0.70 |  0 h

Could you explain a bit about the statistics you posted? I assume it’s some kind of accuracy?

Also, could you explain your workflow, since I’m not that familiar with masc-rcnn. Maybe my approach is not suitable at all and we have to think about another one.

Yeah sure, I just saw your message. Each column shows some statistics, the columns are defined with the separators |. Forget the first one, it just shows the iteration number of the training and some other stuff. Starting from the second column | we have 4 numbers. All of them indicate a loss. The 1st is the rpn (region proposal network) loss, the 2nd is the mask loss, 3rd is the bounding box loss, 4th is the label/class loss. Let me know if that makes any sense or if you need any more info. Thanks again!

So since the second “column” has steady values, your model does not learn anything?

Do you think my approach is applicable at all in your use case, e.g. do you need to backprop through the 4 activations with the additional Cond2d layers?

If so, I think the better way would be to re-implement the model and output the desired activations. This would make sure that the model is being trained.