I’m trying to get the gradient of a certain layer of my model, can somebody explain the difference between these 2 approachs ?

1st approach is:

current_grads=[ ]

for param in net1.conv_layer[0].parameters():

\indent current_grads.append(param.grad.view(-1))

current_grads=torch.cat(current_grads)

2nd approach is:

current_grads=torch.flatten(net1.conv_layer[0].weight.grad)

it seems that the resultant tensor of the 2nd approach is contained in the resultant temsor of the 1st approach. can someone explain why ? thank you.

The first approach will iterate all parameters of the layer (which is often the `weight`

and `bias`

parameter), will flatten the corresponding gradients, and will concatenate them to a single tensor:

```
layer = nn.Conv2d(1, 3, 3)
out = layer(torch.randn(1, 1, 24, 24)).mean()
out.backward()
current_grads = []
for param in layer.parameters():
current_grads.append(param.grad.view(-1))
print(current_grads)
# [tensor([0.0149, 0.0038, 0.0107, 0.0230, 0.0130, 0.0209, 0.0298, 0.0194, 0.0287,
# 0.0149, 0.0038, 0.0107, 0.0230, 0.0130, 0.0209, 0.0298, 0.0194, 0.0287,
# 0.0149, 0.0038, 0.0107, 0.0230, 0.0130, 0.0209, 0.0298, 0.0194, 0.0287]), tensor([0.3333, 0.3333, 0.3333])]
current_grads=torch.cat(current_grads)
```

while the second approach will only store the flattened `weight`

gradient.

1 Like