How to store gradient of entire model using model.named_parameters()? As extarcting model.named_parameters() at the end of training returns 0

I am trying to store the gradients of the entire model. The code is given below:

  for step, batch in enumerate(train_dataloader): 
    outputs = model(**batch)
    loss = outputs.loss
    loss = loss / args.gradient_accumulation_steps
    accelerator.backward(loss)
    progress_bar.update(1)
    progress_bar.set_postfix(loss=round(loss.item(), 3))
    del outputs
    gc.collect()
    torch.cuda.empty_cache()
    
    if (step+1) % args.gradient_accumulation_steps == 0 or (step+1) == len(train_dataloader):
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
 reference_gradient = [ p.grad.view(-1) if p.grad is not None else torch.zeros(p.numel()) for n, p in 
 model.named_parameters()]
 reference_gradient = torch.cat(reference_gradient)

My intension is to store the model parameters of entire model to used it for further calculation in another model. But I have 2 questions here,

  1. Here the reference_gradient variable always returns 0, I understand that this happens because, optimizer.zero_grad() is called after every gradient.accumulation steps, and all the gradients are set to 0.
  2. How can I store the model parameters of the entire model?

You can have a look at pytorch docs:

https://pytorch.org/tutorials/beginner/saving_loading_models.html

In a nutshell, you can use:

Save:

torch.save(model.state_dict(), PATH)

Load:

model = TheModelClass(*args, **kwargs) 
model.load_state_dict(torch.load(PATH))