Which network's weights are backpropagated?

Suppose there are three networks.
When the input is I
First network(f) is encoder : f(I) = a
Second network(g) is decoder: g(a) = b
Third network(h) is classifier: h(b) = c

My loss function is composed with only a and b like loss = torch.norm(b, p=1).

My question is when I implement loss.backward() with my loss function 'torch.norm(b, p=1)
which network’s weights are updated?

I think because the loss is composed with the output of f and g, the first and the second network should be updated and the third network should not be updated.
But when I implement my code, the third network was also be updated…
I think although I did third_optimizer.step(), the grad for the third network should be zero.

Hi Hyuntae!

If you use weight decay, your optimizer will update the weights even
if their gradients are zero.

(Also, you can easily check whether the gradients are zero.)


K. Frank

Oh! Thanks @KFrank !!

I’m really sorry for ignoring you, can you tell me how I can check the gradient?
Actually, I wrote a code for checking gradient by looking at the change of weight like


But it is too coarse code.
I would really appreciate it if you could let me know.


Hi Hyuntae!

Look at the grad property of the tensor. So in your case:


If the gradient hasn’t been created yet (for example, if .backward()
hasn’t been called) .grad will be None. Otherwise, .grad will be
a tensor, possibly zero.


K. Frank


It really helped me a lot of help.