I have a model and an optimizer and I want to save it’s state dict as CPU tensors. Then I want to load those state dicts back on GPU. This seems straightforward to do for a model, but what’s the best way to do this for the optimizer?
This is what my code looks like right now:
model = ... optim = torch.optim.SGD(model.parameters(), momentum=0.1) model_state = model.state_dict() # Convert to CPU for k, v in model_state.items(): model_state[k] = v.cpu() optim_state = optim.state_dict() # Convert to CPU for state in optim_state["state"].values(): for k, v in state.items(): state[k] = v.cpu() # Now I want to load these state dicts back onto GPU model2 = ... model2.cuda() optim2 = torch.optim.SGD(model2.parameters(), momentum=0.1) # This seems to work; the model2 parameters are on GPU model2.load_state_dict(model_state) # Same does not hold true for optimizer optim2.load_state_dict(optim_state)
The only option I see is to manually convert optimizer state back to Cuda
for state in optim2.state.values(): for k, v in state.items(): state[k] = v.cuda()
But would optim2 still update model2’s parameters?