I am trying to store the gradients of the entire model. The code is given below:
for step, batch in enumerate(train_dataloader):
outputs = model(**batch)
loss = outputs.loss
loss = loss / args.gradient_accumulation_steps
accelerator.backward(loss)
progress_bar.update(1)
progress_bar.set_postfix(loss=round(loss.item(), 3))
del outputs
gc.collect()
torch.cuda.empty_cache()
if (step+1) % args.gradient_accumulation_steps == 0 or (step+1) == len(train_dataloader):
optimizer.step()
scheduler.step()
optimizer.zero_grad()
reference_gradient = [ p.grad.view(-1) if p.grad is not None else torch.zeros(p.numel()) for n, p in
model.named_parameters()]
reference_gradient = torch.cat(reference_gradient)
My intension is to store the model parameters of entire model to used it for further calculation in another model. But I have 2 questions here,
- Here the reference_gradient variable always returns 0, I understand that this happens because, optimizer.zero_grad() is called after every gradient.accumulation steps, and all the gradients are set to 0.
- How can I store the model parameters of the entire model?