Gradient Computation


I’m sorry if this question is already answered but I couldn’t find any similar question.

Let’s say i have two different models modelA and modelB. They work sequentially. modelA feeds modelB with its output tensors. And these two models’ weights are updated together.

optimizer = SGD(parameters=list(modelA.parameters()) + list(modelB.parameters())) #Something like that

Imagine a scenario where modelA outputs 10 tensors and modelB use not all of the tensors but 3 of them as input tensors. Well, I’m curious about if the remaining 7 tensors (not used as inputs to modelB and simply wasted) have any effect on gradient update?

I think that they don’t have any effect on gradient update because they are not used in loss calculation. But I just want to be sure about it because it is very important to me not to screw up in this phase.

Thank you very much.


You are right: If they are not used in the loss computation, they won’t contribute to the computed gradients !

1 Like