What is the best way of randomly initialising but freezing the last layers of a densenet?
I have the following code, where I am using a pretrained model but unfreezing the last denseblock4 and norm5 blocks for fine-tuning:
model = models.densenet161(pretrained=True)
for param in model.parameters():
param.requires_grad = False
submodules = model.features[-2:]
for param in submodules.parameters():
param.requires_grad = True
However, instead of training these submodule layers, I would like to randomly intialise them and freeze them. What would be the best way of doing this?
Let’s say I want to use batch normalisation stats provided by ImageNet and so not randomly initialise them. Would the following be appropriate when defining the model before running the training loop?:
with torch.no_grad(): # allows to re-initialize the parameters
submodules = model.features[-2:]
for submodule in submodules.modules():
if isinstance(submodule, torch.nn.Conv2d):
# randomly re-initialize the weights
torch.nn.init.kaiming_normal_(submodule.weight)
if submodule.bias is not None:
# reset the bias to zero
torch.nn.init.zeros_(submodule.bias)
elif isinstance(submodule, torch.nn.BatchNorm2d):
torch.nn.init.ones_(submodule.weight)
torch.nn.init.zeros_(submodule.bias)
# also reset running mean and running_var
torch.nn.init.zeros_(submodule.running_mean)
torch.nn.init.ones_(submodule.running_var)
No, since you are resetting the batchnorm parameters and buffers to their original values while it seems you want to use the pretrained values or am I misunderstanding your use case?
Well, I’m trying to randomly initialise (or set) the last few conv layers whilst keeping them frozen. However, I am also considering keeping the batch norm layers frozen with the previous ImageNet values rather than also randomly generating those values. I’m just curious to see if this would have an impact during training.
Also, this has made me think, wen resetting the parameters, how would I keep them frozen. I’m not sure if the above solution does that .
Ah ok, so would this be sufficient to keep the later modified layers “frozen” at a random initialisation rather than ImageNet values?:
for param in model.parameters():
param.requires_grad = False
submodules = model.features[-2:]
for module in submodules.modules():
if hasattr(module, "reset_parameters"):
module.reset_parameters()
num_ftrs = model.classifier.in_features
model.classifier = torch.nn.Linear(num_ftrs,2)
Ok, apologies I don’t think I made myself clear here.
I want to create a feature extractor and so only train the classifier layers, and freeze the preceding feature extractor layers. However, I would like to randomly initialise and freeze the layers in the last two blocks of the feature extractor and only keep ImageNet values for the layers before these blocks.
I’m just wondering if adding this code after freezing all layers, unfreezes the last two blocks:
for module in submodules.modules():
if hasattr(module, "reset_parameters"):
module.reset_parameters()
I’m assuming that despite resetting parameters, they are still frozen apart from the classifier.
Calling reset_parameters() will manipulate the parameters inplace and will thus not change any attributes. If you’ve frozen these parameters before, they should still be frozen.
However, you can use my code snippets to easily verify it by printing the .requires_grad attribute afterwards.