Noob questions where is the gradient stored during backpropagation?

I know gradients are used to update weights during next fwd pass. But looking at two examples I have, I am confused at where the gradients are stored during backpropagation in typical training model in pytorch?
From example1 below which sort of simulates very simple backpropagation, this gives me an impression that that tensor a/b should have stored it. I can see after Q.backward is called (I guess that is possible when operands of Q function has gradient), the print(a.grad, g.brad) returns non-null number before that it returned null. So that gives me an impression it is stored in Q’s operands.

A Gentle Introduction to torch.autograd — PyTorch Tutorials 2.3.0+cu121 documentation

import torch
a = torch.tensor([2, 3, 12], dtype=torch.float32, requires_grad=True)
b = torch.tensor([6, 4, 5], dtype=torch.float32, requires_grad=True)
Q = 3*a3 - b2
external_grad = torch.tensor([1, 1, 1], dtype=torch.float32)
Q.backward(gradient=external_grad)

But in example2 here with very simple resnet model with input data, it is less clear where the gradients are stored whereas in example1, I can know where it is stored and can see its value before and after backward() is called:

import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)
prediction = model(data)
loss = (prediction - labels).sum()
loss.backward()

Are gradients stored somewhere in the model along with weights? If so, how to print out?

Hi @GGPYTORCH000,

The gradient values are stored within the .grad attribute of a Tensor, and are initialized as None when you create the a and b Tensors. When you call loss.backward() you populate the .grad attribute with the gradient of the loss with respect to the variable.

In the case of example 2, the gradients are found in the .grad attribute of the model.parameters() generator. If you want to see these gradients you can do something like,

for param in model.parameters():
  print(param.grad)

Thx, that put out lot of data, so i converted to numpy and printed shapes, and datas are filled after calling backward(), as expected. This is what i got below. Apparently there are various shapes, i am looking to see if i can print out which gradients are assoc-d with certain layers etc.,

def printModelGrad(pModel):
print(“Grads:”)
for param in pModel.parameters():
print(np.array(param.grad).shape)

Grads:
(64, 3, 7, 7)
(64,)
(64,)
(64, 64, 3, 3)
(64,)
(64,)
(64, 64, 3, 3)
(64,)
(64,)
(64, 64, 3, 3)
(64,)
(64,)
(64, 64, 3, 3)
(64,)
(64,)
(128, 64, 3, 3)
(128,)
(128,)
(128, 128, 3, 3)
(128,)
(128,)
(128, 64, 1, 1)
(128,)
(128,)
(128, 128, 3, 3)
(128,)
(128,)
(128, 128, 3, 3)
(128,)
(128,)
(256, 128, 3, 3)
(256,)
(256,)
(256, 256, 3, 3)
(256,)
(256,)
(256, 128, 1, 1)
(256,)
(256,)
(256, 256, 3, 3)
(256,)
(256,)
(256, 256, 3, 3)
(256,)
(256,)
(512, 256, 3, 3)
(512,)
(512,)
(512, 512, 3, 3)
(512,)
(512,)
(512, 256, 1, 1)
(512,)
(512,)
(512, 512, 3, 3)
(512,)
(512,)
(512, 512, 3, 3)
(512,)
(512,)
(1000, 512)
(1000,)
optim: SGD (
Parameter Group 0
dampening: 0
differentiable: False
foreach: None
fused: None
lr: 0.01
maximize: False
momentum: 0.9
nesterov: False
weight_decay: 0
)

If you want to look at certain layers you can save the gradients within a dict and just use the layer’s id as a key to get the gradients for that layer, i.e.,

grads = {name: param.grad for name, param in model.named_parameters()}

Then pick a layer via its key, i.e.,

first_key = list(grads.keys())[0] #change this to any key
print(grads[first_key])