Hello,
I’m sorry if this question is already answered but I couldn’t find any similar question.
Let’s say i have two different models modelA and modelB. They work sequentially. modelA feeds modelB with its output tensors. And these two models’ weights are updated together.
optimizer = SGD(parameters=list(modelA.parameters()) + list(modelB.parameters())) #Something like that
Imagine a scenario where modelA outputs 10 tensors and modelB use not all of the tensors but 3 of them as input tensors. Well, I’m curious about if the remaining 7 tensors (not used as inputs to modelB and simply wasted) have any effect on gradient update?
I think that they don’t have any effect on gradient update because they are not used in loss calculation. But I just want to be sure about it because it is very important to me not to screw up in this phase.
Thank you very much.