Suppose there are three networks.
When the input is I
First network(f) is encoder : f(I) = a
Second network(g) is decoder: g(a) = b
Third network(h) is classifier: h(b) = c

My loss function is composed with only a and b like loss = torch.norm(b, p=1).

My question is when I implement loss.backward() with my loss function 'torch.norm(b, p=1)
which network’s weights are updated?

I think because the loss is composed with the output of f and g, the first and the second network should be updated and the third network should not be updated.
But when I implement my code, the third network was also be updated…
I think although I did third_optimizer.step(), the grad for the third network should be zero.

I’m really sorry for ignoring you, can you tell me how I can check the gradient?
Actually, I wrote a code for checking gradient by looking at the change of weight like

Look at the grad property of the tensor. So in your case:

self.classifier.fc[0].weight[0:1].grad

If the gradient hasn’t been created yet (for example, if .backward()
hasn’t been called) .grad will be None. Otherwise, .grad will be
a tensor, possibly zero.