Random initialisation of densenet layers

SU801T · December 9, 2022, 10:35pm

What is the best way of randomly initialising but freezing the last layers of a densenet?

I have the following code, where I am using a pretrained model but unfreezing the last denseblock4 and norm5 blocks for fine-tuning:

model = models.densenet161(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
submodules = model.features[-2:]  
for param in submodules.parameters():
    param.requires_grad = True

However, instead of training these submodule layers, I would like to randomly intialise them and freeze them. What would be the best way of doing this?

ptrblck · December 10, 2022, 12:02am

You could remove the second loop, which “unfreezes” the parameters and call .reset_parameters() on the desired modules instead:

for module in submodules.modules():
    if hasattr(module, "reset_parameters"):
        module.reset_parameters()

or use a custom weight initialization via submodules.apply.

SU801T · December 10, 2022, 1:00am

ok great, I think that’s exactly what I was looking for. Is there a way of checking if they are frozen/unfrozen?

ptrblck · December 10, 2022, 3:51am

Yes, you can check the .requires_grad of the corresponding parameters and verify it’s set to False:

for name, param in submodules.named_parameters():
    print("{}.requires_grad == {}".format(name, param.requires_grad))

SU801T · December 16, 2022, 1:10am

Let’s say I want to use batch normalisation stats provided by ImageNet and so not randomly initialise them. Would the following be appropriate when defining the model before running the training loop?:

with torch.no_grad():  # allows to re-initialize the parameters
    submodules = model.features[-2:] 
    for submodule in submodules.modules():
        if isinstance(submodule, torch.nn.Conv2d):
            # randomly re-initialize the weights
            torch.nn.init.kaiming_normal_(submodule.weight)
            if submodule.bias is not None:
                # reset the bias to zero
                torch.nn.init.zeros_(submodule.bias)
        elif isinstance(submodule, torch.nn.BatchNorm2d):
            torch.nn.init.ones_(submodule.weight)
            torch.nn.init.zeros_(submodule.bias)
            # also reset running mean and running_var
            torch.nn.init.zeros_(submodule.running_mean)
            torch.nn.init.ones_(submodule.running_var)

ptrblck · December 16, 2022, 1:33am

No, since you are resetting the batchnorm parameters and buffers to their original values while it seems you want to use the pretrained values or am I misunderstanding your use case?

SU801T · December 16, 2022, 1:44am

Well, I’m trying to randomly initialise (or set) the last few conv layers whilst keeping them frozen. However, I am also considering keeping the batch norm layers frozen with the previous ImageNet values rather than also randomly generating those values. I’m just curious to see if this would have an impact during training.

Also, this has made me think, wen resetting the parameters, how would I keep them frozen. I’m not sure if the above solution does that .

ptrblck · December 16, 2022, 3:23am

You are resetting the batchnorm parameters and buffers to their initial value as seen here and are not keeping the pretrained values.

Set their .requires_grad attribute to False as already explained.

SU801T · December 16, 2022, 11:03pm

Ah ok, so would this be sufficient to keep the later modified layers “frozen” at a random initialisation rather than ImageNet values?:

    for param in model.parameters():
        param.requires_grad = False
    submodules = model.features[-2:]  
    for module in submodules.modules():
        if hasattr(module, "reset_parameters"):
            module.reset_parameters()
        
    num_ftrs = model.classifier.in_features
    model.classifier = torch.nn.Linear(num_ftrs,2)

ptrblck · December 17, 2022, 6:27am

No, you would need to freeze model.classifier after replacing it with a new linear layer:

for param in model.classifier.parameters():
    param.requires_grad = False

SU801T · December 17, 2022, 4:37pm

Ok, apologies I don’t think I made myself clear here.

I want to create a feature extractor and so only train the classifier layers, and freeze the preceding feature extractor layers. However, I would like to randomly initialise and freeze the layers in the last two blocks of the feature extractor and only keep ImageNet values for the layers before these blocks.

I’m just wondering if adding this code after freezing all layers, unfreezes the last two blocks:

for module in submodules.modules():
        if hasattr(module, "reset_parameters"):
            module.reset_parameters()

I’m assuming that despite resetting parameters, they are still frozen apart from the classifier.

ptrblck · December 18, 2022, 2:02am

Calling reset_parameters() will manipulate the parameters inplace and will thus not change any attributes. If you’ve frozen these parameters before, they should still be frozen.
However, you can use my code snippets to easily verify it by printing the .requires_grad attribute afterwards.

SU801T · December 18, 2022, 7:42pm

Great, thank you for being patient with me!