While I am initializing the weights from pretrained model I first check if “all model’s layer match pretrained model’s key” ,so here’s what I do:
model_resnet = resnet50() #this is built completely same as the link above
resnet50_weights = torch.load("resnet50-19c8e357.pth")
#get all layer name from model_resnet
names = {}
for name,param in model_resnet.named_parameters():
names[name] = 0
#to see if there's anything missing
for key in resnet50_weights:
if key not in names:
print(key)
As far as I know,when we freeze the pretrained model,one of issue is that we need to be careful about either using batch statistics or running mean&var that calculated while using transfer learning ,I think we should use the running mean&var as we “freeze” the model like the link below also mentioned: The Batch Normalization layer of Keras is broken
Anyone has any idea how do we get running_mean&var?
'collections.OrderedDict' object has no attribute 'state_dict'
As for Do you really need to manually load each parameter and buffer?
Actually no, I am new to pytorch is there’s a better way to do what I want ,I would always prefer it! But for curiosity sake ,I still wonder why it fails…
Based on the error message, it seems resnet50_weight might already be the state_dict. Could you try to load them as model_resnet.load_state_dict(resnet50_weights)?
Thanks!It works but what if I want to initialize the model except the fc layer ,I
I think in this case we have to assign each parameter and buffer or is there an alternative?
Your approach might work for this model.
However, if your model’s forward method is a bit more complicated than a simple sequence of modules, you could also set model_resnet.fc = nn.Identity().
in forward function in resnet50 I delete the flatten()
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
#x = torch.flatten(x)
x = self.fc(x)
return x
And after constructing the model ,I do what you said
And without flatten, I think the output shape would right now be same as last residual block output shape,if you find anything wrong above,please do correct me .Thanks!
The Identity assignment would be a hack in case you don’t want to manipulate the forward.
Since you are changing it, just comment out the calls to self.avgpool and self.fc1 and check the shape of the output after passing some random input to the model.
OK, I’ve got desired shape,Thanks!!!
BTW, I was browsing around and I found that in this link ByteTensor to FloatTensor is slow? - PyTorch Forums
Why do people always convert tensor’s type to float while calculating, would that be faster?
The default dtype is float32, which works faster in CUDA than float64.
Usually you don’t need the precision and range of FP64, and can stick to FP32.
However, Tensorcores in newer NVIDIA GPUs can accelerate FP16 operations.
Since the value range and precision might be too narrow for some models using this number format, we developed NVIDIA/apex containing mixed precision recipes and are currently working on upstreaming it into PyTorch in this issue.
I’m not familiar with training methods using int values, since the gradients would also be int, wouldn’t they?
Based on this assumption, it sounds not really useful, but I haven’t looked for the latest research papers on this topic.
Inference however is possible using int values and might give you a performance gain on specialized hardware. Have a look at Quantization to get more information.
Yeah would definitely check out,And just a little to ask:The situation is I would like to freeze the pretrained model,as the link below has already showed how you can completely freeze bn layer while training: freeze bn
But i what’s confuses me is these lines of codes:
if freeze_bn_affine:
'''Freezing Weight/Bias of BatchNorm2D'''
m.weight.requires_grad = False
m.bias.requires_grad = False
So since I have already set the bn layers to eval() stage,why do I still need this?
And after this should I need to set the affine= False manually ?
train and eval will change the behavior of the running estimates (using batch stats and updating running stats in train, while using running estimates in eval).
This is unrelated to the trainable affine parameters (weight and bias), so you should freeze these parameters, if you don’t want to train them any further.
Your line of code would re-initialize the batchnorm layer, which seems to be wrong, as you’ll lose all trained parameters and estimated stats.
So no, don’t use this line of code, just freeze the parameters using their requires_grad flag.