How to save the requires_grad state of the weights

n0obcoder · August 8, 2019, 8:06pm

I define a model

import torch
from torchvision import models

model = models.resnet18()

It has all its layers set as trainable (requires_grad = True for all layers)
Then I freeze the final fc layer

for param in model.fc.parameters():
param.requires_grad = False

Then I save the state_dict of the model

torch.save(model.state_dict(), ‘model.pth’)

Now I want to load the weights again. So I define the model once again and load the saved weights in it…

model_new = models.resnet18()
model_new.load_state_dict(torch.oad(‘model.pth’))

Now when I print the requires_grad of its fc layer, all the requires_grad are back to the original setings (True)

for param in model_new.fc.parameters():
print(param.requires_grad)

Its prints
True
True

So the question is that, how is the requires_grad setting getting changed on loading the weights?

Does saving the model.state_dict() even save the requires_grad settings?

ptrblck · August 8, 2019, 11:51pm

The state_dict contains the tensors (basically the data), not the nn.Parameters, which hold the requires_grad attribute.
Since you are recreating the model and load the state_dict, all flags are re-initialized.


model = models.resnet50()
print(model.fc.weight)
> Parameter containing:
tensor([[-0.0046, -0.0034, -0.0171,  ..., -0.0081,  0.0019, -0.0050],
        [-0.0167, -0.0093,  0.0100,  ...,  0.0180,  0.0164,  0.0170],
        [ 0.0093, -0.0114,  0.0144,  ...,  0.0003,  0.0202,  0.0163],
        ...,
        [-0.0188, -0.0040, -0.0019,  ...,  0.0113, -0.0164, -0.0054],
        [ 0.0212, -0.0127,  0.0155,  ..., -0.0200,  0.0092,  0.0188],
        [-0.0179,  0.0083, -0.0003,  ..., -0.0012, -0.0048, -0.0127]],
       requires_grad=True)
sd = model.state_dict()
print(sd['fc.weight'])
> tensor([[-0.0046, -0.0034, -0.0171,  ..., -0.0081,  0.0019, -0.0050],
        [-0.0167, -0.0093,  0.0100,  ...,  0.0180,  0.0164,  0.0170],
        [ 0.0093, -0.0114,  0.0144,  ...,  0.0003,  0.0202,  0.0163],
        ...,
        [-0.0188, -0.0040, -0.0019,  ...,  0.0113, -0.0164, -0.0054],
        [ 0.0212, -0.0127,  0.0155,  ..., -0.0200,  0.0092,  0.0188],
        [-0.0179,  0.0083, -0.0003,  ..., -0.0012, -0.0048, -0.0127]])

n0obcoder · August 9, 2019, 6:25am

Thanks for the clarification @ptrblck ! : D

But what do you suggest me to do if i want to save the requires_grad flag for all the layers too ?

John_Deterious · August 9, 2019, 6:44am

I guess save the entire model.

n0obcoder · August 9, 2019, 6:54am

yes I tried

torch.save(model, ‘path_name.pth’)

and it seemed to work.

thanks @John_Deterious

and whats the best way to save the state of the optimizers ? why do we even need to save optimizer’s state? Can you pls explain this to me ?

Thanks in advance !!!

n0obcoder · August 9, 2019, 7:48am

model = models.resnet50()

sd = model.state_dict()
sd[‘fc.weight’] has both .data and .requires_grad attribute

So looking at the information that it holds it feels like saving the state_dict should be saving the requires_grad flag too…

Can you pls help me on that…

ptrblck · August 9, 2019, 11:34am

The requires_grad attribute you are printing from sd['fc.weight'].requires_grad is the default one for torch.Tensors.

I don’t think so, as the state_dict is supposed to store only the parameters, not the computation etc.
@John_Deterious’s suggestion to store the complete model might work.

I would follow the ImageNet example and store a custom dict containing all data I would like to store. Some optimizers use internal buffers (e.g. running estimates etc.), that should also be stored/restored, if you plan on continuing the training.