My initial reply was wrong.
New reply:
When I load your optimizer ('opt.pt'
) it has .grad
as None
:
grad = torch.load(path_to_pt_files + 'grad.pt', weights_only=False)
opt = torch.load(path_to_pt_files + 'opt.pt', weights_only=False)
print('grad:')
print(grad)
print(grad.grad)
print('opt:')
print(opt.param_groups[0]['params'][8].grad)
grad:
tensor([[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 2.6891e-04, 2.6891e-04, 2.6891e-04, ..., 1.2958e-04,
9.1939e-05, 1.3036e-04],
[-3.6381e-06, -3.6381e-06, -3.6381e-06, ..., -1.3383e-06,
-1.3358e-06, -1.3876e-06],
...,
[ 5.5009e-04, 5.5009e-04, 5.5009e-04, ..., 2.4709e-04,
2.2063e-04, 1.9834e-04],
[-7.2875e-04, -7.2875e-04, -7.2875e-04, ..., -3.2083e-04,
-2.5821e-04, -3.2125e-04],
[-3.8400e-07, -3.8400e-07, -3.8400e-07, ..., -1.7627e-07,
-1.3704e-07, -1.5496e-07]], device='cuda:0')
None
opt:
None
It could be because you’re saving the optimizer rather than the .state_dict
, and if you want to assign some new_gradients
to your already existing 'opt.pt'
file:
model = nn.Sequential(
nn.Linear(6, 2, bias=False),
nn.Sigmoid(),
)
inputs = torch.randn(6)
target = torch.randn(2)
criterion = nn.MSELoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01)
torch.save(optimizer.state_dict(), path_to_pt_files + 'example_optim.pt')
optimizer_state_dict = torch.load(path_to_pt_files + 'example_optim.pt', weights_only=False)
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01)
optimizer.load_state_dict(optimizer_state_dict)
print('saved optimizer.grad: ', optimizer.param_groups[0]['params'])
optimizer.zero_grad()
torch.save(optimizer.state_dict(), path_to_pt_files + 'example_optim.pt')
optimizer_state_dict = torch.load(path_to_pt_files + 'example_optim.pt', weights_only=False)
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01)
optimizer.load_state_dict(optimizer_state_dict)
print('saved after zeroing optimizer.grad: ', optimizer.param_groups[0]['params'])
optimizer.zero_grad()
output = model(inputs)
loss = criterion(output, target)
loss.backward()
torch.save(optimizer.state_dict(), path_to_pt_files + 'example_optim.pt')
optimizer_state_dict = torch.load(path_to_pt_files + 'example_optim.pt', weights_only=False)
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01)
optimizer.load_state_dict(optimizer_state_dict)
print('saved after predicting optimizer.grad: ', optimizer.param_groups[0]['params'])
optimizer.step()
torch.save(optimizer.state_dict(), path_to_pt_files + 'example_optim.pt')
optimizer_state_dict = torch.load(path_to_pt_files + 'example_optim.pt', weights_only=False)
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01)
optimizer.load_state_dict(optimizer_state_dict)
print('saved after stepping optimizer.grad: ', optimizer.param_groups[0]['params'])
new_gradient = torch.rand(optimizer.param_groups[0]['params'][0].grad.shape, requires_grad=True)
print(f'new_gradient: {new_gradient}')
optimizer.param_groups[0]['params'][0] = torch.nn.parameter.Parameter(new_gradient, requires_grad=True)
print('new values for optimizer.grad: ', optimizer.param_groups[0]['params'])
torch.save(optimizer.state_dict(), path_to_pt_files + 'example_optim.pt')
optimizer_state_dict = torch.load(path_to_pt_files + 'example_optim.pt', weights_only=False)
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01)
optimizer.load_state_dict(optimizer_state_dict)
print('new values after saving again for optimizer.grad: ', optimizer.param_groups[0]['params'])
it will cause a KeyError
when saving out the optimizer again:
saved optimizer.grad: [Parameter containing:
tensor([[ 3.9279e-01, 2.6180e-01, 3.1024e-01, -3.6962e-02, -1.4670e-01,
-2.1468e-01],
[-1.2634e-04, -2.9840e-01, 9.1751e-02, -1.2751e-01, 1.7776e-01,
-2.1791e-01]], requires_grad=True)]
saved after zeroing optimizer.grad: [Parameter containing:
tensor([[ 3.9279e-01, 2.6180e-01, 3.1024e-01, -3.6962e-02, -1.4670e-01,
-2.1468e-01],
[-1.2634e-04, -2.9840e-01, 9.1751e-02, -1.2751e-01, 1.7776e-01,
-2.1791e-01]], requires_grad=True)]
saved after predicting optimizer.grad: [Parameter containing:
tensor([[ 3.9279e-01, 2.6180e-01, 3.1024e-01, -3.6962e-02, -1.4670e-01,
-2.1468e-01],
[-1.2634e-04, -2.9840e-01, 9.1751e-02, -1.2751e-01, 1.7776e-01,
-2.1791e-01]], requires_grad=True)]
saved after stepping optimizer.grad: [Parameter containing:
tensor([[ 0.2928, 0.1618, 0.2102, 0.0630, -0.2467, -0.3147],
[-0.1001, -0.3984, -0.0082, -0.0275, 0.0778, -0.3179]],
requires_grad=True)]
new_gradient: tensor([[0.8659, 0.3938, 0.4791, 0.8014, 0.1843, 0.5315],
[0.7555, 0.2915, 0.1225, 0.6326, 0.5627, 0.9159]], requires_grad=True)
new values for optimizer.grad: [Parameter containing:
tensor([[0.8659, 0.3938, 0.4791, 0.8014, 0.1843, 0.5315],
[0.7555, 0.2915, 0.1225, 0.6326, 0.5627, 0.9159]], requires_grad=True)]
Traceback (most recent call last):
...
...
...
(param_mappings[id(k)] if isinstance(k, torch.Tensor) else k): v
KeyError: 139737484299376
Which I’m not sure why at the moment and I’ll have to come back to this. Or someone smarter than me could give us some direction.