Saving and Loading Optimizer Params

Hi,
I’m trying to save and load optimizer params as we do for a model, but although i tried in many different ways, still i couldn’t work it. Here is the code:

best_model_wts = copy.deepcopy(model.state_dict())
best_optim_pars = copy.deepcopy(optimizer.state_dict())

for epoch in range(num_epochs):
    for phase in ['train', 'val']:
         if phase == 'train':
             model.train()
         else:
             model.eval()

         running_loss = 0.0
         running_corrects = 0

         if epoch < 20:
             error_sigma = 2.0
         elif 19 < epoch < 40:
             error_sigma = 1.5
             if epoch == 20:
                 model.load_state_dict(best_model_wts)
                 optimizer.load_state_dict(best_optim_pars)

         for inputs, labels, _ in data_loaders[phase]:
             batch_size = len(labels)
             converted_inputs = dictionary_to_tensor(inputs, batch_size)
             converted_inputs = numpy2tensor(converted_inputs)
             labels = labels.to(device)

             optimizer.zero_grad()
             with torch.set_grad_enabled(phase == 'train'):
                    outs = model(converted_inputs)
                    loss = criterion(outs, labels)
                    target_err = torch.sign(loss - error_sigma)

                    _, preds_all = torch.max(outs, 1)

                    if phase == 'train':
                        loss.backward()
                        optimizer.step(target_err)

                running_loss += loss.item() * converted_inputs.size(1)
                running_corrects += torch.sum(preds_all == labels.data)

            if phase == 'train':
                scheduler.step()

            data_size = len(data_loaders[phase].dataset)

            epoch_loss = running_loss / data_size
            epoch_acc = running_corrects.double() / data_size

            if phase == 'val':
                val_acc_history.append(epoch_acc)
                if epoch_acc > best_acc:
                    best_acc = epoch_acc
                    best_epoch = epoch
                    best_loss = epoch_loss
                    best_model_wts = copy.deepcopy(model.state_dict())
                    best_optim_pars = copy.deepcopy(optimizer.state_dict()) #this line gives error

Then, I get the below error:
raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Any idea about how to solve it? Thank you very much.

Hi. Did you try the official pytorch tutorial?
Is it work for you?

import torch
import torch.nn as nn

m = nn.Linear(10, 2)
opt = torch.optim.Adam(m.parameters())
best = {'optimizer_state_dict': opt.state_dict()}

opt.zero_grad()
opt.step()

opt = torch.optim.Adam(m.parameters())
opt.load_state_dict(best['optimizer_state_dict'])

This dummy example is working fine for me.

1 Like

Hi, Thank you for the follow up. Yes, I tried the official solution but still it didn’t work. There’s no problem with model save/load. The problem is optimizer state save/load. In my current case, the below code raises an error:

best_optim_pars = copy.deepcopy(optimizer.state_dict())

When I update the code by removing copy.deepcopy as below:

best_optim_pars = optimizer.state_dict()

Then the line here gives error:

optimizer.load_state_dict(best_optim_pars)
import torch
import torch.nn as nn

X = torch.rand(2, 10)
y = torch.tensor([[0,1], [1,0]], dtype=torch.float32)

m = nn.Linear(10, 2)
opt = torch.optim.Adam(m.parameters())
state = opt.state_dict()
crit = nn.BCEWithLogitsLoss()

for i in range(5):
    if i == 2:
        # load
        opt.load_state_dict(state)
    opt.zero_grad()
    out = m(X)
    loss = crit(y, out)
    loss.backward()
    opt.step()
    # save
    state = opt.state_dict()

Update to less dummy example. Also works fine.

If you are running this in notebook, I would suggest to restart kernel so to get rid of best_optim_pars = copy.deepcopy(optimizer.state_dict()) variable.

also, I am doing it in current version of Pytorch

I tried all these different ways actually, the error remains. Optimizer is SGD with momentum:

optimizer_ft = optim.SGD(model_ft.parameters(), lr=params.lr, momentum=params.momentum)
optimizer_ft.zero_grad()

I also tried without copy.deepcopy as in your solution:

best_model_wts = copy.deepcopy(model.state_dict())
best_optim_pars = optimizer.state_dict()

And then, in the code when I load:

 if epoch == 20:
     model.load_state_dict(best_model_wts)
     #this is for initialization, not sure if needed?
     optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
     optimizer.zero_grad()
     optimizer.load_state_dict(best_optim_pars)

For saving the best model and the current optim params:

 if phase == 'val':
     val_acc_history.append(epoch_acc)
     if epoch_acc > best_acc:
         best_acc = epoch_acc
         best_epoch = epoch
         best_loss = epoch_loss
         best_model_wts = copy.deepcopy(model.state_dict())
         best_optim_pars = optimizer.state_dict()

Still getting this error:

File “/media/alic/ssdmain/Projects/crandrnn/src/train_multi_level_model.py”, line 83, in train_model
optimizer.load_state_dict(best_optim_pars)
File “/home/alic/anaconda3/envs/crandrnn/lib/python3.7/site-packages/torch/optim/optimizer.py”, line 105, in load_state_dict

File “/home/alic/anaconda3/envs/crandrnn/lib/python3.7/site-packages/torch/tensor.py”, line 23, in deepcopy
raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.

It is strange, I can’t reproduce your error using deepcopy. I am not sure, it may be python version or even system dependent

import copy
import torch
import torch.nn as nn

X = torch.rand(2, 10)
y = torch.tensor([[0,1], [1,0]], dtype=torch.float32)

m = nn.Linear(10, 2)
opt = torch.optim.SGD(m.parameters(), lr=0.1)

m_state = copy.deepcopy(m.state_dict())
state = opt.state_dict()

crit = nn.BCEWithLogitsLoss()

for i in range(5):
    if i == 2:
        # load
        m.load_state_dict(m_state)
        opt = torch.optim.SGD(m.parameters(), lr=0.1)
        opt.load_state_dict(state)
    opt.zero_grad()
    out = m(X)
    loss = crit(y, out)
    loss.backward()
    opt.step()
    # save
    m_state = copy.deepcopy(m.state_dict())
    state = opt.state_dict()

Perhaps, as error suggest you can get rid off all deepcopy calls.

1 Like

Also, you could take a look in this thread. Perhaps, you are doing something similar with your model or optimizer.

Thank you for your help. I’m actually doing the same as you suggest in your solution:
Initial:

best_model_wts = copy.deepcopy(model.state_dict())
best_optim_state = optimizer.state_dict()

Updating in the loop:

if epoch < 20:
    error_sigma = 2.0
    elif 19 < epoch < 40:
        error_sigma = 1.5
        if epoch == 20:
            model.load_state_dict(best_model_wts)
            optimizer = optim.SGD(model.parameters(), lr=0.001)
            optimizer.load_state_dict(best_optim_state)

And saving the best models and optimizer state:

if phase == 'val':
    val_acc_history.append(epoch_acc)
    if epoch_acc > best_acc:
        best_acc = epoch_acc
        best_epoch = epoch
        best_loss = epoch_loss
        best_model_wts = copy.deepcopy(model.state_dict())
        best_optim_state = optimizer.state_dict()

However, I’m still getting the same error during the update line for optimizer.
I tried your solution and it runs without any problem on my environment (python 3.7.5, pytorch 1.2) but mine is still giving the above-mentioned runtime error.

Did you try not to use deepcopy at all in your code?
Also, did you check if your code updating or reassigning a model parameters somewhere (as in linked thread I posted before)?
Can you update to a current pytorch version?

1 Like

Thank you very much. It works after updating to the current pytorch version.

  • I tried not to use any deepcopy at all, but again it failed during optimizer state updating with the same error. No problem for model update.
  • I checked the link but in their code, they are reproducing the error, I did not get it for my case actually.

Anyway, as soon as I updated the pytorch version, it worked without any problem. Still didn’t get it why but it’s working.
Thanks again.

Glad, I can help. Perhaps some intrinsic of pytorch deepcopy works differently in new version, the only reason I see. Apparently, now all tensors “support the deepcopy protocol”, not only “graph leaves” :slight_smile:

2 Likes