how to save the previous parameters and gradient temporally for use them in the next step/loop

I am wondering how to save the parameters/weights and their dependent gradient (after using backward() from the object optimizer (e.g. optimizer = optim.SGD…) to them at each step/loop), in order to re-load them back into the optimizer at next step. Since I need to save and re-load the parameters & their gradients every step/loop time, I do not want to save them out to a separate pt or pth file and load time at every loop.

I tried to implement the code like:

for i, data in enumerate(trainloader, 0):
    inputs, labels = data
    inputs, labels = Variable(inputs), Variable(labels)
    def closure():
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        return loss.data[0]
    # I saved parameters & their gradients here! I also modified the return stuff in sgd.py as well, which is not shown here.
    running_loss, param_groups_temp_prev, param_groups_grad_temp_prev = optimizer.step(closure)
    # Here I tried to re-assign the above saved two lists back to optimizer again for the second call of optimizer.step(closure)                                                          
    optimizer.param_groups[0]['params'] = copy.deepcopy([pgt_prev for pgt_prev in param_groups_temp_prev])
    for ii in range(len(param_groups_grad_temp_prev)):
        pggt_prev = copy.deepcopy(param_groups_grad_temp_prev[ii])
        optimizer.param_groups[0]['params'][ii].grad = copy.deepcopy(Variable(pggt_prev))
    running_loss_temp, param_groups_temp1, param_groups_grad_temp1 = optimizer.step(closure)

I save them as param_groups_temp_prev and param_groups_grad_temp_prev, in order to re-load them in the second time to call optimizer.step(closure) in the same loop (first time loop), everything went well.

However, when it goes to the loop run to the 2nd time, after optimizer.zero_grad() is called, still good, gradients of all parameters were cleared to zero. But, after loss.backward() was called, the gradients remained zero still, which seemed that the backward() was out of work!

Could anyone help me out of this issue? Appreciate!