I'm trying to get the gradient of a certain layer of my model, can somebody explain the difference between these 2 approachs?

I’m trying to get the gradient of a certain layer of my model, can somebody explain the difference between these 2 approachs ?

1st approach is:
current_grads=[ ]
for param in net1.conv_layer[0].parameters():
\indent current_grads.append(param.grad.view(-1))
current_grads=torch.cat(current_grads)

2nd approach is:
current_grads=torch.flatten(net1.conv_layer[0].weight.grad)

it seems that the resultant tensor of the 2nd approach is contained in the resultant temsor of the 1st approach. can someone explain why ? thank you.

The first approach will iterate all parameters of the layer (which is often the weight and bias parameter), will flatten the corresponding gradients, and will concatenate them to a single tensor:

layer = nn.Conv2d(1, 3, 3)
out = layer(torch.randn(1, 1, 24, 24)).mean()
out.backward()

current_grads = []
for param in layer.parameters():
    current_grads.append(param.grad.view(-1))
print(current_grads)
# [tensor([0.0149, 0.0038, 0.0107, 0.0230, 0.0130, 0.0209, 0.0298, 0.0194, 0.0287,
#         0.0149, 0.0038, 0.0107, 0.0230, 0.0130, 0.0209, 0.0298, 0.0194, 0.0287,
#         0.0149, 0.0038, 0.0107, 0.0230, 0.0130, 0.0209, 0.0298, 0.0194, 0.0287]), tensor([0.3333, 0.3333, 0.3333])]
current_grads=torch.cat(current_grads)

while the second approach will only store the flattened weight gradient.

1 Like